Database redesign opportunity: What table design to use for this sensor data collection?

You should think about partitioning the table for a big reason.

All indexes you have on a giant table, even just one index, can generated a lot of CPU load and disk I/O just to perform index maintenance when executing INSERTs, UPDATEs, and DELETEs.

I wrote an earlier post back on October 7, 2011 on why Table Partitioning would be a big help. Here is one excerpt from my past post:

Partitioning of data should serve to group data that are logically and cohesively in the same class. Performance of searching each partition need not be the main consideration as long as the data is correctly grouped. Once you have achieved the logical partitioning, then concentrate on search time. If you are just separating data by id only, it is possible that many rows of data may never be accessed for reads or writes. Now, that should be a major consideration: Locate all ids most frequently accessed and partition by that. All less frequently accessed ids should reside in one big archive table that is still accessible by index lookup for that 'once in a blue moon' query.

You can read my entire post later on this.

To cut right to the chase, you need to research and find out what data is rarely used in your 10GB table. That data should be placed in an archive table that is readily accessible should you need adhoc queries for a historical nature. Migrating that archival from the 10GB, followed by OPTIMIZE TABLE on the 10GB table, can result in a Working Set that is faster to run SELECTs, INSERTs, UPDATEs, and DELETEs. Even DDL would go faster on a 2GB Working Set than a 10GB table.

UPDATE 2012-02-24 16:19 EDT

Two points to consider

  1. From your comment, it sounds like normalization is what you may need.
  2. You may need to migrate out everything over 90 days old into an archive table but still access archive and working set at the same time. If your data is all MyISAM, I recommend using the MERGE storage engine. First, you create the MERGE table map once that unites a working set MyISAM table and an archive MyISAM table. You would keep data less than 91 days in one MyISAM table and rollover any data over 90 days old into the archive. You would query the MERGE table map only.

Here are two posts I made on how to use it:

  • get column from too many tables in mysql
  • Separate tables or partition one huge table?

Here is an additional post I made on tables with a lot of columns

Too many columns in MySQL


Interesting... If all sensors produce the same kind of data, it does make sense to put them all in the same table, but with that amount of data, I can see why you'd be worried about performance.

Is 90 days the usual amount of time that you produce a graph for? If so, you could have two tables: the main sensor data table that stores data from 90 (or a little more if you want some slack) days ago up to today, and everything older than that goes in the archive table. That could help reduce the size of the table that reports are begin generated from, and hopefully the majority of your 10 GB of data will be in the archive table, and not in the main table. The archiving job can be scheduled to run nightly.

Maybe also consider building a separate reporting database that stores the data in a structure that is better for generating reports from (tables designed to more closely match what you are querying, and maybe pre-calculate and aggregate values that would otherwise take a long time to generate, if possible), and re-populate it from the main database on a regular (such as nightly) basis. Of course, if you need the reports generated from up-to-the-minute data, this might not work so well.