Cast to date is sargable but is it a good idea?

The mechanism behind the sargability of casting to date is called dynamic seek.

SQL Server calls an internal function GetRangeThroughConvert to get the start and end of the range.

Somewhat surprisingly this is not the same range as your literal values.

Creating a table with a row per page and 1440 rows per day

CREATE TABLE T
  (
     DateTimeCol DATETIME PRIMARY KEY,
     Filler      CHAR(8000) DEFAULT 'X'
  );

WITH Nums(Num)
     AS (SELECT number
         FROM   spt_values
         WHERE  type = 'P'
                AND number BETWEEN 1 AND 1440),
     Dates(Date)
     AS (SELECT {d '2012-12-30'} UNION ALL
         SELECT {d '2012-12-31'} UNION ALL
         SELECT {d '2013-01-01'} UNION ALL
         SELECT {d '2013-01-02'} UNION ALL
         SELECT {d '2013-01-03'})
INSERT INTO T
            (DateTimeCol)
SELECT DISTINCT DATEADD(MINUTE, Num, Date)
FROM   Nums,
       Dates

Then running

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT *
FROM   T
WHERE  DateTimeCol >= '20130101'
       AND DateTimeCol < '20130102'

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101';

The first query has 1443 reads and the second 2883 so it is reading an entire additional day then discarding it against a residual predicate.

The plan shows the seek predicate is

Seek Keys[1]: Start: DateTimeCol > Scalar Operator([Expr1006]), 
               End: DateTimeCol < Scalar Operator([Expr1007])

So instead of >= '20130101' ... < '20130102' it reads > '20121231' ... < '20130102' then discards all the 2012-12-31 rows.

Another disadvantage of relying on it is that the cardinality estimates may not be as accurate as with the traditional range query. This can be seen in an amended version of your SQL Fiddle.

All 100 rows in the table now match the predicate (with datetimes 1 minute apart all on the same day).

The second (range) query correctly estimates that 100 will match and uses a clustered index scan. The CAST( AS DATE) query incorrectly estimates that only one row will match and produces a plan with key lookups.

The statistics aren't ignored completely. If all rows in the table have the same datetime and it matches the predicate (e.g. 20130101 00:00:00 or 20130101 01:00:00) then the plan shows a clustered index scan with an estimated 31.6228 rows.

100 ^ 0.75 = 31.6228

So in that case it appears the estimate is derived from this formula:

The following table shows the number of conjuncts guessed and the resultant selectivity as a function of input table cardinality of N:

| Conjuncts | Cardinality | Selectivity |
|-----------|-------------|-------------|
| 1         | N^(3/4)     | N^(-1/4)    |
| 2         | N^(11/16)   | N^(-5/16)   |
| 3         | N^(43/64)   | N^(-21/64)  |
| 4         | N^(171/256) | N^(-85/256) |
| 5         | N^(170/256) | N^(-86/256) |
| 6         | N^(169/256) | N^(-87/256) |
| 7         | N^(168/256) | N^(-88/256) |
| ...       |             |             |
| 175       | N^(0/256)   | N^(-1)      |

If all rows in the table have the same datetime and it doesn't match the predicate (e.g. 20130102 01:00:00) then it falls back to the estimated row count of 1 and the plan with lookups.

For the cases where the table has more than one DISTINCT value the estimated rows seems to be the same as if the query was looking for exactly 20130101 00:00:00.

If the statistics histogram happens to have a step at 2013-01-01 00:00:00.000 then the estimate will be based on the EQ_ROWS (i.e. not taking into account other times on that date). Otherwise if there is no step it looks as though it uses the AVG_RANGE_ROWS from the surrounding steps.

As datetime has a precision of approx 3ms in many systems there will be very few actual duplicate values and this number will be 1.

I know that this has a long-standing Great Answer® from Martin, but I wanted to add in some changes to the behavior here in newer versions of SQL Server. This appears only to have been tested up to 2008R2.

With the new USE HINTs that make doing some cardinality estimation time travel possible, we can see when things changed.

Using the same setup as in the SQL Fiddle.

CREATE TABLE T ( ID INT IDENTITY PRIMARY KEY, DateTimeCol DATETIME, Filler CHAR(8000) NULL );

CREATE INDEX IX_T_DateTimeCol ON T ( DateTimeCol );


WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
     E02(N) AS (SELECT 1 FROM E00 a, E00 b),
     E04(N) AS (SELECT 1 FROM E02 a, E02 b),
     E08(N) AS (SELECT 1 FROM E04 a, E04 b),
     Num(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY E08.N) FROM E08)
INSERT INTO T(DateTimeCol)
SELECT TOP 100 DATEADD(MINUTE, Num.N, '20130101')
FROM Num;

We can test the different levels like so:

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101'
OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_100' ));
GO

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101'
OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110' ));
GO 

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101'
OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_120' ));
GO 

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101'
OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130' ));
GO 

SELECT *
FROM   T
WHERE  CAST(DateTimeCol AS DATE) = '20130101'
OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140' ));
GO

The plans for all of these are available here. Compat levels 100 and 110 both give the key lookup plan, but starting with compat level 120, we start getting the same scan plan with 100 row estimates. This is true up to compat level 140.

NUTS

Cardinality estimation for the >= '20130101', < '20130102' plans remain at 100, which was expected.

Cast to date is sargable but is it a good idea?

Tags:

Sql Server

Related

Recent Posts