Poor temporal table performance on older values

In a comment from Zane on your question, he stated:

...It seems like part of your problem is you're reading 50 million rows in order to return 20K in the plan.

This is, indeed, the problem. There's no index available to push some, or all, of the predicates down to the storage engine. Microsoft recommends this baseline indexing strategy for temporal tables in the Docs article Temporal Table Considerations and Limitations:

An optimal indexing strategy will include a clustered columns store index and / or a B-tree rowstore index on the current table and a clustered columnstore index on the history table for optimal storage size and performance. If you create / use your own history table, we strongly recommend that you create this type of index consisting of period columns starting with the end of period column to speed up temporal querying as well as speeding up the queries that are part of the data consistency check. The default history table has a clustered rowstore index created for you based on the period columns (end, start). At a minimum, a non-clustered rowstore index is recommended

The phrasing of that is a little confusing (to me, anyway). But the takeaway is that you could create these indexes to improve performance some, if not quite a lot:

NC index on the current table, leading with `SysEndTime`:

CREATE NONCLUSTERED INDEX IX_SysEndTime_SysStartTime 
ON dbo.Benefits (SysEndTime, SysStartTime)
/*INCLUDE (ideally, include your other important fields here)*/;

This will allow you to avoid reading some of the rows in the current table by seeking to the appropriate end time.

CCI on the history table

CREATE CLUSTERED COLUMNSTORE INDEX ix_BenefitsHistory
ON dbo.BenefitsHistory
WITH (DROP_EXISTING = ON);

This will let you get batch mode on the history table, which should make the scans much faster.

NC index on the current table, leading with `SysStartTime`:

See Paul's answer to the question Most Efficient Way to Retrieve Date Ranges for more details on why indexing for date range queries is hard. Based on the logic there, it makes sense to add another NC index on the current table that leads with SysStartTime, so that the optimizer can choose which one to use based on statistics and the specific parameters of your query:

CREATE NONCLUSTERED INDEX IX_SysStartTime_SysEndTime
ON dbo.Benefits (SysStartTime, SysEndTime)
/*INCLUDE (ideally, include your other important fields here)*/;

Creating the 3 indexes outlined above made a significant difference in resource usage in my test cases. I set up a test case which runs two queries that return 1.5 million total rows. Both the history and current tables have 50 million rows).

Note: To reduce SSMS overhead, I ran the test with "Discard results after execution" option enabled.

Execution Plan - Default Indexes

Logical reads: 1,330,612
CPU time: 00:00:14.718
Elapsed time: 00:00:06.198

Execution Plan - With Indexes Described Above

Logical reads: 27,656 (8,111 row store + 19,545 columnstore)
CPU time: 00:00:01.828
Elapsed time: 00:00:01.150

As you can see, all 3 measures dropped significantly - including total elapsed time, from 6 seconds to 1 second.

The other option presented by the Docs article is to forgo the two NC indexes on the current table in favor of a clustered columnstore index. In my test, performance was very similar to the indexing solution described above.

The FOR SYSTEM TIME AS OF clause tries to return the dataset as it existed at the stated time. This means that updates have to be rolled back internally, deletes have to be 'undeleted', and inserts have to be ignored, based on the system time of the request.

The farther in the past the AS OF time is, the more work needs to be validated to ensure that the temporal table is as it existed at the specified system time, and thus the longer the query will take.

IF the data table is just a logging table, and no changes are made to the data, then using the date logged and an index will return data faster and more consistently. Whether to use the temporal features in this case is unnecessary. However, if changes are made to the rows (other than inserts), then using the temporal table feature is the only way to return the exact data being requested (the state of the table as it existed at that specific time), and you will just have to accept the additional overhead of the temporal queries.

Note: The "rollbacks" are not actual rollbacks. Temporal tables use two tables - a Current table, and a History table. When a row is changed, a copy of the previous version is inserted into the History table with the time range that the row was valid. If you insert a row at 10/20/2018 10:20:20.18, update a value at 10/25/2018 10:25:20.18, and update it again at 12/01/2018 12:01:20.18, you have the latest version of the row in the Current table with a start date of 12/01/2018 12:01:20.18, and two rows in the history table with valid ranges of 10/20 to 10/25/2018, and 10/25 to 12/01/2018

Poor temporal table performance on older values

NC index on the current table, leading with `SysEndTime`:

CCI on the history table

NC index on the current table, leading with `SysStartTime`:

Tags:

Performance

Sql Server

Temporal Tables

Related

Recent Posts

Poor temporal table performance on older values

NC index on the current table, leading with SysEndTime:

CCI on the history table

NC index on the current table, leading with SysStartTime:

Tags:

Performance

Sql Server

Temporal Tables

Related

NC index on the current table, leading with `SysEndTime`:

NC index on the current table, leading with `SysStartTime`: