SQL Staging Tables: Primary Key Clustered or Heap

Having an identity column doesn’t force you to use it as a clustered index key.

You’re right that heaps work well here. I would consider Thomas Kejser to be the authority on the subject, and it’s good you’ve listed him as one of your resources.

As for fragmentation in heaps - doesn’t happen on insert-only.

Edit: Go through this article about parallel insert, and notice the comparisons between heaps and clustered indexes. https://blogs.msdn.microsoft.com/sqlcat/2016/07/21/real-world-parallel-insert-what-else-you-need-to-know/

We had a similar scenario and recently switched our staging tables from clustered indexes to heaps. The first big advantage for us was that we wanted to allow concurrent SSIS loads into the same staging table. You can do that with a clustered index, but you'll likely run into a lot of blocking, especially with an identity column. The second big advantage was cutting down on the overhead of loading the staging tables. We found that our loads went much faster on heaps compared to clustered indexes.

Our performance testing does not show much difference, but our data may change later. So we need to make a good design decision.

Are you sure that this is true? In the question you say that you truncate your staging tables before the load. If some part of your load process changes, it should be very straightforward to add or remove a clustered index while the tables are empty. There's no data movement involved. It doesn't sound like you would get any benefit from a clustered index, so I would try it out as a heap and monitor performance.

SQL Staging Tables: Primary Key Clustered or Heap

Tags:

Performance

Sql Server

Database Design

Sql Server 2016

Etl

Performance Tuning

Related

Recent Posts