Handling very large data with mysql

  • Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.

  • Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being nearly 6 8 years old, and on the verge of no longer being supported.

  • Best data store for billions of rows -- If you mean 'Engine', then InnoDB.

  • How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.

  • Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.

  • Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.

  • (for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes

  • having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this critical area.

  • partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog

  • MySQL Sharding -- Currently this is DIY

  • MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.

  • Partitioning does not support FOREIGN KEY or "global" UNIQUE.

  • UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.

  • Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative CREATE TABLE and how you intend to feed the data in.

  • Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.

  • Do build summary tables

  • 2,3 million transactions per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.

  • deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.

  • Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify fuel_type. There are multiple 1-byte approaches.


When collecting billions of rows, it is better (when possible) to consolidate, process, summarize, whatever, the data before storing. Keep the raw data in a file if you think you need to get back to it.

Doing that will eliminate most of your questions and concerns, plus speed up the processing.