Identity column as clustered index bad idea?

I never saw a identity column that is not also an index, usually the Primary Key.

Now we need to distinguish Primary Key (PK) and Clustered Index (CI) , the first is all about the logic of the database schema, the Primary Key is what make a row different from all the other in the table, and the Foreign Key for other tables. A identity column is always a Candidate Key, but it's artificial and you may want the natural Candidate Key as the PK.

Clustered Index instead is about how the index will be created from the data and stored. There can only be one clustered index and it will be the only index that refers to the data in the table. All the other indexes will refer to the clustered one.

Usually the PK is also the CI, but that's simply the default behaviour. I've seen, and sometime created, PK that were not CI: the PK was the Natural Key, the CI was the identity column. That because, simplifying the how index works, the smaller is the data in the CI definition, the faster the index is, and the CI need to be as fast as possible, so in case where the PK is huge having a identity column as the clustered index and make the PK a non clustered will improve the performances.

So in my opinion using a identity column as the clustered index is not a bad idea, but that doesn't mean that it should also be the primary key.

The only scenario I can think of where a identity column can be a bad choice is when there is a so high volume of incoming data that even the creation of the identity will hit the performance.


I usually use an identity column as clustered primary key. However in some (rare?) cases this is not ideal because of the LastPageInsertLatchContention. This happens if a table is heavely filled with data. Because of the identity key all this INSERT's wants to write the last page of the table (index). So this page can be locked and the performance may be better with another solution.

See

  • http://dangerousdba.blogspot.ch/2011/10/bit-reversion.html

  • http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/monotonically-increasing-clustered-index-keys-can-cause-latch-contention.aspx

  • http://www.sqlpassion.at/archive/2014/04/15/an-ever-increasing-clustered-key-value-doesnt-scale/

for details.


Which keys/indexes to cluster is not an exact science - the best use of a clustered index can vary depending on the table's use (and the use of the columns in that key).

The clustered key is more efficient for queries that pick out many rows in a range due to there being no need for extra row lookups to find the data for the rows found after searching the index. It helps for single row lookups too, but the difference is not as noticeable. For instance we have a tables that are often searched by object owner ID (rather than object ID which is the primary key), so it is more efficient for our app to have the index on that column be the clustered key, similarly it is sometimes much better to have the clustered key on commonly referenced date columns if rows over date ranges are often searched for.

If the PK of a given table is often a join target then clustering its PK can help as for certain join operations the reduction in further page lookups can be a big bonus, and of course if you have a PK based on real data (rather than a surrogate key like an auto-increment number or UUID) that is subject to ranged queries it has the benefits you'd expect. These reasons are why having your PK be clustered is generally a good position to start from before other considerations are taken into account, and hence why it is a common recommendation (and sometimes an automatically applied default).

As a side note: if you end up using a UUID column instead of an incrementing integer type as the PK on a table then clustering on it can be harmful to performance because the extra page splits created by inserting "random" data into the index (each page split on the clustered index results in extra IO activity on all the other indexes on the table too) which slows inserts and can exacerbate fragmentation issues over time. So in this situation it can often be much better to cluster a different index (or sometimes not have a clustered index at all , though this is not possible on SQL Server for Azure[1] and it is rare that not having a clustered key is a benefit rather than a detriment overall).

[1] it has been possible to have a heap (a table without a clustering key) on Azure SQL for some time now, though with similar caveats as found in on-pre SQL Server of it rarely being a great idea