InfluxDB : single or multiple measurement

This likely depends on your data, try both and see the storage requirements. For example if humidity does not change much, then it makes sense to separate it. But if some variables change in a similar time intervals, then it makes sense to combine them. It may also depend on your query patterns.

There is no right or wrong to go with either schema design but going with one measurement one field value is the more appropriate approach.

Why?

Storing multiple field values into a measurement is a very relational database thing. That is, a measurement should not be seen as a database table as it is a very different thing.

A measurement should be reserved explicitly for describing a type of data, like temperature or CPU usage.

If we design our schema using the one field value per measurement then we can describe the data in real English like;

At a certain point in time, the temperature is measured as data value=30. Noticed the term used here, point, data and measurement.

Whereas if you put multiple field values into a specific measurement then you will find it difficult to present the data in real English.

influxdb is a time series database so it is obvious that we should do it the time-series way.

Also, some of the time series data are actually measured down to the precision level of micro-seconds. In such fine grain timing, even for milliseconds it is unlikely for a set of data to share the same timing. Hence designing it as one measurement containing a sequence of data point is always the better choice.

Bit of an old question but this is probably relevant to anyone working on TSDBs.

When I first started, my appoach used to be that every data point went into a single field measurement. The assumption was that I'd combine the data I needed in a SQL statement at a later date. However, as anyone who's used a TSDB like influx knows that there are some serious limitations with one can do in the retrieval of data because of the design choices used in implementing a TSDB.

As I've moved forward in my project, here are the rules of thumb I have developed:

A measurement should contain all the dimensions required for it to make sense but no more.

Example: imagine a gas flow meter which gives 3 signals:

volumetric flow
temperature
total flow

In this scenario, volumetric flow and temperature should be two fields of a single measurement, and total flow should be its own measurement.

(if the reader doesn't like this example, think of a home electric meter that outputs amps and volts, and kw and pf).

Why would it be bad to store volumetric and temp in different series?

Timing: if you store those two measurements in different series, they will have different index values (timestamp). Unless you take care to make sure they have explicitly specified timestamps, you run the risk of them being slightly offsampled. This can very well end up being a Bad Thing (tm) because you might be introducing a systematic measurement bias in your data. Even if it's not a bad thing, it's going to be super annoying if you ever want to reuse this data later on (e.g. to dump it in a csv file).
Utility: if you want to deduce volumetric flow rate, you will have to get constant * temp * volume to get a correct value. Doing this with two separate measurements becomes a nightmare because, for instance, influxdb does not even support the operation. But even if it did, you'd have to make sure missing values of one of the fields aren't incorrectly handled and that grouping and aggregation is done right.

Why would it be bad to store all three in a single measurement?

You may very well have a use case in which you want to audit all three values at all times, but chances are this is not the case and you don't care about measuring total volume at the same kind of frequency that you'd like to measure flow itself.

Putting all the fields in a single measurement will force you to either put nulls in certain fields, or to always log a variable that barely changes. Either way, it's not efficient.

The important insight is that multi-dimensional entities require all their dimensions at the same time to make sense.

InfluxDB : single or multiple measurement

A measurement should contain all the dimensions required for it to make sense but no more.

Why would it be bad to store volumetric and temp in different series?

Why would it be bad to store all three in a single measurement?

Tags:

Database Schema

Influxdb

Related

Recent Posts