How to improve estimate of 1 row in a View constrained by DateAdd() against an index

A less comprehensive answer than Aaron's but the core issue is a cardinality estimation bug with DATEADD when using the datetime2 type:

Connect: Incorrect estimate when sysdatetime appear in a dateadd() expression

One workaround is to use GETUTCDATE (which returns datetime):

WHERE CreatedUtc > CONVERT(datetime2(7), DATEADD(DAY, -365, GETUTCDATE()))

Note the conversion to datetime2 must be outside the DATEADD to avoid the bug.

An incorrect cardinality estimation reproduces for me in all versions of SQL Server up to and including 2019 CU8 GDR (build 15.0.4083) when the 70 model cardinality estimator is used.

Aaron Bertrand has written an article about this for SQLPerformance.com:

Performance Surprises and Assumptions : DATEADD()

In some scenarios SQL Server can have really wild estimates for DATEADD/DATEDIFF, depending on what the arguments are and what your actual data looks like. I wrote about this for DATEDIFF when dealing with beginning of the month, and some workarounds, here:

Performance Surprises and Assumptions : DATEDIFF

But, my typical advice is to just stop using DATEADD/DATEDIFF in where/join clauses.

The following approach, while not super accurate when a leap year is in the filtered range (it will include an extra day in that case), and while rounded to the day, will get better (but still not great!) estimates, just like your non-sargable DATEDIFF against the column approach, and still allow a seek to be used:

DECLARE @start date = DATEFROMPARTS
(
  YEAR(GETUTCDATE())-1, 
  MONTH(GETUTCDATE()), 
  DAY(GETUTCDATE())
);

SELECT ... WHERE CreatedUtc >= @start;

You could manipulate the inputs to DATEFROMPARTS to avoid issues on leap day, use DATETIMEFROMPARTS to get more precision instead of rounding to the day, etc. This is just to demonstrate that you can populate a variable with a date in the past without using DATEADD (it's just a little more work), and thus avoid the more crippling part of the estimation bug (which is fixed in 2014+).

To avoid errors on leap day, you can do this instead, starting from last year's Feb 28 instead of 29:

DECLARE @start date = DATEFROMPARTS
(
  YEAR(GETUTCDATE())-1, 
  MONTH(GETUTCDATE()), 
  CASE WHEN DAY(GETUTCDATE()) = 29 AND MONTH(GETUTCDATE()) = 2 
    THEN 28 ELSE DAY(GETUTCDATE()) END
);

You could also say add a day by checking to see if we're past a leap day this year, and if so, add a day to the beginning (interestingly, using DATEADD here still allows for accurate estimates):

DECLARE @base date = GETUTCDATE();
IF GETUTCDATE() >= DATEFROMPARTS(YEAR(GETUTCDATE()),3,1) AND 
  TRY_CONVERT(datetime, DATEFROMPARTS(YEAR(GETUTCDATE()),2,29)) IS NOT NULL
BEGIN
  SET @base = DATEADD(DAY, 1, GETUTCDATE());
END

DECLARE @start date = DATEFROMPARTS
(
  YEAR(@base)-1, 
  MONTH(@base),
  CASE WHEN DAY(@base) = 29 AND MONTH(@base) = 2 
    THEN 28 ELSE DAY(@base) END
);

SELECT ... WHERE CreatedUtc >= @start;

If you need to be more accurate than to the day at midnight, then you can just add more manipulation before the select:

DECLARE @accurate_start datetime2(7) = DATETIME2FROMPARTS
(
  YEAR(@start), MONTH(@start), DAY(@start),
  DATEPART(HOUR,  SYSUTCDATETIME()), 
  DATEPART(MINUTE,SYSUTCDATETIME()),
  DATEPART(SECOND,SYSUTCDATETIME()), 
  0,0
);

SELECT ... WHERE CreatedUtc >= @accurate_start;

Now, you could jam all of this in a view, and it will still use a seek and the 30% estimate without requiring any hints or trace flags, but it ain't pretty. Nested CTEs are just so that I don't have to type SYSUTCDATETIME() a hundred times or repeat reused expressions - they can still be evaluated multiple times.

CREATE VIEW dbo.v5 
AS
  WITH d(d) AS ( SELECT SYSUTCDATETIME() ),
  base(d) AS
  (
    SELECT DATEADD(DAY,CASE WHEN d >= DATEFROMPARTS(YEAR(d),3,1) 
      AND TRY_CONVERT(datetime,RTRIM(YEAR(d))+RIGHT('0'+RTRIM(MONTH(d)),2)
      +RIGHT('0'+RTRIM(DAY(d)),2)) IS NOT NULL THEN 1 ELSE 0 END, d)
    FROM d
  ),
  src(d) AS
  (
    SELECT DATETIME2FROMPARTS
    (
      YEAR(d)-1, 
      MONTH(d),
      CASE WHEN MONTH(d) = 2 AND DAY(d) = 29
        THEN 28 ELSE DAY(d) END,
      DATEPART(HOUR,d), 
      DATEPART(MINUTE,d),
      DATEPART(SECOND,d),
      10*DATEPART(MICROSECOND,d),
      7
    ) FROM base
  )
  SELECT DISTINCT SessionId FROM [User].[Session]
    WHERE CreatedUtc >= (SELECT d FROM src);

This is a lot more verbose than your DATEDIFF against the column, but as I mentioned in a comment, that approach is not sargable, and will probably perform competitively while most of the table has to be read anyway, but I suspect it will become a burden as "the last year" becomes a lower percentage of the table.

Also, just for reference, here are some of the metrics I got when I tried to reproduce:

I couldn't get 1-row estimates, and I tried very hard to match your distribution (3.13 million rows, 2.89 million from the last year). But you can see:

both of our solutions perform roughly equivalent reads.
your solution is slightly less accurate because it only accounts for day boundaries (and that might be fine, my view could be made less precise to match).
4199 + recompile did not really change the estimates (or the plans).

Don't draw too much from the duration figures - they're close now, but may not stay close as the table grows (again, I believe because even the seek still has to read most of the table).

Here are the plans for v4 (your datediff against column) and v5 (my version):

How to improve estimate of 1 row in a View constrained by DateAdd() against an index

Tags:

Sql Server

Optimization

View

Sql Server 2012

Related

Recent Posts