SQL Offset total row count slow with IN Clause
Step one for performance related questions is going to be to analyze your table/index structure, and to review the query plans. You haven't provided that information, so I'm going to make up my own, and go from there.
I'm going to assume that you have a heap, with ~10M rows (12,872,738 for me):
DECLARE @MaxRowCount bigint = 10000000, @Offset bigint = 0; DROP TABLE IF EXISTS #ExampleTable; CREATE TABLE #ExampleTable ( ID bigint NOT NULL, Name varchar(50) COLLATE DATABASE_DEFAULT NOT NULL ); WHILE @Offset < @MaxRowCount BEGIN INSERT INTO #ExampleTable ( ID, Name ) SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )), ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) FROM master.dbo.spt_values SV CROSS APPLY master.dbo.spt_values SV2; SET @Offset = @Offset + ROWCOUNT_BIG(); END;
If I run the query provided over
#ExampleTable, it takes about 4 seconds and gives me this query plan:
This isn't a great query plan by any means, but it is hardly awful. Running with live query stats shows that the cardinality estimates were at most off by one, which is fine.
Lets give a massive number of items in our
IN list (5000 items from 1-5000). Compiling the plan took 4 seconds:
I can get my number up to 15000 items before the query processor stops being able to handle it, with no change in query plan (it does take a total of 6 seconds to compile). Running both queries takes about 5 seconds a pop on my machine.
This is probably fine for analytical workloads or for data warehousing, but for OLTP like queries we've definitely exceeded our ideal time limit.
Lets look at some alternatives. We can probably do some of these in combination.
- We could cache off the
INlist in a temp table or table variable.
- We could use a window function to calculate the count
- We could cache off our CTE in a temp table or table variable
- If on a sufficiently high SQL Server version, use batch mode
- Change the indices on your table to make this faster.
If this is for an OLTP workflow, then we need something that is fast regardless of how many users we have. As such, we want to minimize recompiles and we want index seeks wherever possible. If this is analytic or warehousing, then recompiles and scans are probably fine.
If we want OLTP, then the caching options are probably off the table. Temp tables will always force recompiles, and table variables in queries that rely on a good estimate require you to force a recompile. The alternative would be to have some other part of your application maintain a persistent table that has paginated counts or filters (or both), and then have this query join against that.
If the same user would look at many pages, then caching off part of it is probably still worth it even in OLTP, but make sure you measure the impact of many concurrent users.
Regardless of workflow, updating indices is probably okay (unless your workflows are going to really mess with your index maintenance).
Regardless of workflow, batch mode will be your friend.
Regardless of workflow, window functions (especially with either indices and/or batch mode) will probably be better.
Batch mode and the default cardinality estimator
We pretty consistently get poor cardinality estimates (and resulting plans) with the legacy cardinality estimator and row-mode executions. Forcing the default cardinality estimator helps with the first, and batch-mode helps with the second.
If you can't update your database to use the new cardinality estimator wholesale, then you'll want to enable it for your specific query. To accomplish that, you can use the following query hint:
OPTION( USE HINT( 'FORCE_DEFAULT_CARDINALITY_ESTIMATION' ) ) to get the first. For the second, add a join to a CCI (doesn't need to return data):
LEFT OUTER JOIN dbo.EmptyCciForRowstoreBatchmode ON 1 = 0 - this enables SQL Server to pick batch mode optimizations. These recommendations assume a sufficiently new SQL Server version.
What the CCI is doesn't matter; we like to keep an empty one around for consistency, that looks like this:
CREATE TABLE dbo.EmptyCciForRowstoreBatchmode ( __zzDoNotUse int NULL, INDEX CCI CLUSTERED COLUMNSTORE );
The best plan I could get without modifying the table was to use both of them. With the same data as before, this runs in <1s.
WITH TempResult AS ( SELECT ID, Name, COUNT( * ) OVER ( ) MaxRows FROM #ExampleTable WHERE ID IN ( <<really long LIST>> ) ) SELECT TempResult.ID, TempResult.Name, TempResult.MaxRows FROM TempResult LEFT OUTER JOIN dbo.EmptyCciForRowstoreBatchmode ON 1 = 0 ORDER BY TempResult.Name OFFSET ( @PageNum - 1 ) * @PageSize ROWS FETCH NEXT @PageSize ROWS ONLY OPTION( USE HINT( 'FORCE_DEFAULT_CARDINALITY_ESTIMATION' ) );
As far as I know there are 3 ways to achieve this, besides using the #temp table approach already mentioned. In my test cases below, I've used a SQL Server 2016 Developer instance with 6CPU/16GB RAM, and a simple table containing ~25M rows.
Method 1: CROSS JOIN
DECLARE @PageSize INT = 10 , @PageNum INT = 1; WITH TempResult AS (SELECT id , shortDesc FROM dbo.TestName WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) SELECT *, MaxRows FROM TempResult CROSS JOIN (SELECT COUNT(1) AS MaxRows FROM TempResult) AS TheCount ORDER BY TempResult.shortDesc OFFSET (@PageNum - 1) * @PageSize ROWS FETCH NEXT @PageSize ROWS ONLY;
Test result 1:
Method 2: COUNT(*) OVER()
DECLARE @PageSize INT = 10 , @PageNum INT = 1; WITH TempResult AS (SELECT id , shortDesc FROM dbo.TestName WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) ) SELECT *, MaxRows = COUNT(*) OVER() FROM TempResult ORDER BY TempResult.shortDesc OFFSET (@PageNum - 1) * @PageSize ROWS FETCH NEXT @PageSize ROWS ONLY;
Test result 2:
Method 3: 2nd CTE
Test result 3 (T-SQL used was the same as in the question):
The fastest method depends on your data structure (and total number of rows) in combination with your server sizing/load. In my case using COUNT(*) OVER() proved to be the fastest method. To find what is best for you, you have to test what is best for your scenario. And don't rule out that #table approach either just yet ;-)