The raw or aggregated data from data collectors is stored in data stores, like SQL databases, NoSQL databases, data warehouses, and distributed systems, like HDFS. This data may require some cleaning and preparation if it is unstructured. The file format in which the data is received varies from database dumps, JSON files, parquet files, avro files, and even flat files. For distributed data storage systems, the data upon ingestion gets distributed to different file formats.
Some of the popular data stores available for use as per industry standards are:
- RDBMS (relational database management system): RDBMS are legacy storage options and are extremely popular in the data warehouse world. They store data retaining the Atomicity, Consistency, Isolation, and Durability (ACID) properties. However, they suffer from downsides are storage in volume and velocity.
- MongoDB: MongoDB is a popular NoSQL, document-oriented database. It has a wide adoption in the cloud computing world. It can handle data in any format, like structured, semi- structured, and unstructured. With a high code push frequency, it is extremely agile and flexible. MongoDB is inexpensive compared with other monolithic data storage options.
- Bigtable: This is a scalable NoSQL data base from Google. Bigtable is a part of the reliable Google Cloud Platform (GCP). It is seamlessly scalable, with a very high throughput. Being a part of GCP enables it to be easily plugged in behind visualization apps like Firebase. This is extremely popular among app makers, who use it to gather data insights. It is also used for business analytics.
- AWS Cloud Storage Services: Amazon AWS is a range of cloud storage services for IOT devices, distributed data storage platforms, and databases. AWS data storage services are extremely secure for any cloud computing components.