Is the concept of a clustered index in a DB design sensical when using SSDs?

Ask yourself another question: If the entire database is in memory and I never have to touch the disk, do I want to store my data in an ordered B-tree or do I want to store my data in an unordered heap?

The answer to this question will depend on your access pattern. On most cases your access requires single row look-up (ie. seeks) and range scans. These access patterns require a B-Tree, otherwise they are inefficient. Some other access patterns, common in DW and OLAP, are always doing aggregates over the entire table end-to-end always and they do no benefit from range scans. As you drill further other requirements come to light, like the speed of insert and allocation into a heap vs. B-Tree may play a role for huge ETL transfer jobs. But most times the answer really boils down to one question: do you seek or range-scan? The overwhelming number of times the answer is YES. And therefore the overwhelming number of times the design requires a clustered index.

In other words: just because is cheap to read it from disk in random order does not imply that you can trash your TLBs and L2 lines in a 64Gb RAM scan bonanza...

If you use a well-chosen clustered index, you are more likely to get all the related data you need in fewer pages of data. That is, you can hold the data you need in less memory. This gives a benefit regardless of whether you use spinning disks or SSD.

But you're correct that the other benefit of a clustered index -- to read/write related data sequentially instead of with many disk seeks -- isn't a significant benefit for SSD, where seeks are not such a huge performance overhead as they are with spinning disks.

Re @Matthew PK's comment.

Of course location A in RAM is just as quick as location B in RAM. That's not the point. I'm talking about the case when all the data you need won't fit in RAM if the data is scattered among many pages. Any given page may contain only a small amount of data you're interested in. So the RDBMS has to keep loading and purging pages as you access A, B, and other rows. That's where you get the performance penalty.

It would be better for every page to be full of data you're interested in, in the hopes that all of the subsequent row requests are served from pages in RAM. Using a clustered index is a good way to ensure that your data is grouped together onto fewer pages.

Yes, it absolutely still does make sense. You're thinking too low-level in your approach. SQL Server (in a very very simplified explanation) stores clustered data in a B-tree architecture. This allows for fast data retrieval based on the clustered index key values.

A heap (no clustered index) has no sequential order of data. The most important thing to consider here that is in a heap the data pages are not linked in a linked list.

So the answer is yes, it still makes sense to have clustered indexes created on tables, even on an SSD. It's all based on how much data SQL Server has to sift through to get to the resulting data. With a clustered index seek, it is minimized.

Reference: http://msdn.microsoft.com/en-us/library/ms189051.aspx

Is the concept of a clustered index in a DB design sensical when using SSDs?

Tags:

Sql Server

Clustered Index

Related

Recent Posts