What is the optimal solution for storing real time time-series in MariaDB / MySQL?

With just MySQL/MariaDB, I would employ:

  • High speed ingestion: http://mysql.rjweb.org/doc.php/staging_table
  • Summary tables (to make fetching the data much faster): http://mysql.rjweb.org/doc.php/summarytables
  • I would even consider not storing the raw data; instead summarize the data, then toss it. If this is practical, it avoids most of the questions you pose.
  • (If data needs purging): Rapid deletion of old data: http://mysql.rjweb.org/doc.php/partitionmaint
  • I would avoid FOREIGN KEYS because of the extra overhead. (Instead, I would debug the SQL.)
  • I would not use UUID keys; performance degenerates terribly in huge tables. (http://mysql.rjweb.org/doc.php/uuid)
  • I would avoid extra indexes -- Don't use AUTO_INCREMENT if some other column(s) is unique.
  • You mention Spatial -- Please elaborate. 2D lookups are tricky; SPATIAL is one approach; here is another: http://mysql.rjweb.org/doc.php/latlng

Your last paragraph throws in the kitchen sink of questions (Toku, MyRocks, archive, compression, history table). I'm surprised the posting has not be killed for being "too broad". Please elaborate on what your data and queries are like; otherwise, all we can do is throw a kitchen sink full of solutions.

You say "realtime", yet you need "thousands/sec". Can you allow 1 minute delay in realtime? 1 sec? You cannot get 1ms; 1s will be hard to achieve. How long does a burst last? What is a per-minute burst? 1K/sec will probably spill into the next couple of seconds. 6K/minute is not much trouble.

How many clients are storing data? Some solutions work well with a single client; different solutions are needed for multiple clients.

Keep in mind that benchmarks are tuned to show off one thing, and rarely match real life.


There are some big questions there that probably need closer examination than can be achieved here, as there are so many dependencies (realize that you know this!). There are a number of slide sets from presentations on the Percona Live and Percona Live Europe pages about time series that might help you get further down the track. For example, about using ClickHouse from Yandex

https://www.percona.com/live/17/program/schedule/time-series

https://www.percona.com/live/e17/program-open-source-databases

You might also find some of the blog posts interesting. This one looks at TokuDB versus InnoDB for a time series benchmark.

https://www.percona.com/blog/2013/09/05/tokudb-vs-innodb-timeseries-insert-benchmark/

Whereas this one looks at MongoDB and TokuMX https://www.percona.com/blog/2015/05/26/storing-time-series-data-with-mongodb-and-tokumx/

Hope these help.