The data stored in InfluxDB is generally a sequence of data point based on time. The records often have hundreds of millions of rows, including timestamps and other fields and tags. Typically, the data point is immutable and read only, the new point will automatically write and keep appending to measurement. For a large amount of data, we need careful design for the data mode. Define which attribute we can use for the indexed tag so that we can use the others as unindexed fields. It is critical for query performance. Here are a few recommendations when you design an InfluxDB data model:
- If the data is searched by query frequently, consider storing it in a tag
- If query have group by, consider storing the data in a tag
- If you want to use the data with an InfluxQL function, consider storing it in fields
- If you need data of non-string type, consider storing it in fields
- If the data has dynamic values, consider storing it in fields
Try avoiding too many series, tags contain large string information such as hashes and universally unique identifiers (UUIDs), which can cause high memory usage for database workloads. One of the key roles impacting performance is high series cardinality, which often causes high RAM usage. Based on the recent InfluxDB hardware guidelines, for less than 100k unique series, it recommends approximately 2-4 GB of RAM. When one measurement has few highly dynamic value of tags, like more than thousands, it can easily consume more than 32 GB memory usage. It will cause high series cardinality. On the other hand, when tag keys and values are stored only once, it only needs more storage and doesn't impact the memory footprint.