How Does Table Partitioning Help?

The following is just insane ranting and raving...

If you leave all data in one table (no partitioning), you will have O(log n) search times using a key. Let's take the worst index in the world, the binary tree. Each tree node has exactly one key. A perfectly balanced binary tree with 268,435,455 (2^28 - 1) tree nodes would be a height of 28. If you split up this binary tree into 16 separate trees, you get 16 binary trees each with 16,777,215 (2^24 - 1) tree nodes for a height of 24. The search path is reduced by 4 nodes, a 14.2857 % height reduction. If the search time is in microseconds, a 14.2857 % reduction in search time is nil-to-negligible.

Now in the real world, a BTREE index would have treenodes with multiple keys. Each BTREE search would perform binary searching within the page with a possible decent into another page. For example, if each BTREE page contained 1024 keys, a tree height of 3 or 4 would be the norm, a short tree height indeed.

Notice that a partitiioning of a table does not reduce the height of the BTREE which is already small. Given a partitioning of 260 milliion rows, there is even the strong likelihood of having multiple BTREEs with the same height. Searching for a key may pass through all root BTREE pages every time. Only one will fulfill the path of the needed search range.

Now expand on this. All the partitions exist on the same machine. If you do not have separate disks for each partition, you will have disk I/O and spindle rotations as an automatic bottleneck outside of partition search performance.

In this case, paritioning by database does not buy you anything either if id is the only search key being utitlized.

Partitioning of data should serve to group data that are logically and cohesively in the same class. Performance of searching each partition need not be the main consideration as long as the data is correctly grouped. Once you have achieved the logical partitioning, then concentrate on search time. If you are just separating data by id only, it is possible that many rows of data may never be accessed for reads or writes. Now, that should be a major consideration: Locate all ids most frequently accessed and partition by that. All less frequently accessed ids should reside in one big archive table that is still accessible by index lookup for that 'once in a blue moon' query.

The overall impact should be to have at least two partitions: One for frequently accessed ids, and the other paritiion for the rest of the ids. If the frequently accessed ids is fairly large, you could optionally partition that.


200 million rows is certainly in the range where you could benefit from table partitioning. Depending on your application, you could bet some of the benefits listed below:

  • Ease of purging old data If you need to clear down records more than (say) 6 months old, you can partition the table on the date and then swap out older partitions. This is much faster than deleting data from a table and can often be done on a live system. In the OP's case this might be helpful for system maintenance.

  • Multiple disk volumes Partitioning allows you to split data to distribute disk traffic across multiple disk volumes for speed. With a modern RAID controller this isn't likely to be an issue for the OP.

  • Faster table and range scans Really, an operational system shouldn't be doing this sort of thing, but a data warehouse or similar system will do this sort of query in quantity. Table scans use mainly sequential disk traffic, so they are typically the most efficient way to process a query that returns more than a few percent of the rows in a table.

    Partitioning by a common filter (typically time or period based) allows large chunks of the table to be eliminated from such queries if the predicate can be resolved against the partitioning key. It also allows the table to be split over multiple volumes, which can give significant performance gains for large data sets. Normally, this is not an issue for operational systems.

For the OP's purposes partitioning isn't likely to achieve much performance benefit for operational queries, but it may be useful for system management. If there is any significant requirement to report aggregates across large volumes of data then an appropriate partitioning scheme may help with that.