Parameter Sniffing vs VARIABLES vs Recompile vs OPTIMIZE FOR UNKNOWN

The query is

SELECT SUM(Amount) AS SummaryTotal
FROM   PDetail WITH(NOLOCK)
WHERE  ClientID = @merchid
       AND PostedDate BETWEEN @datebegin AND @dateend 

The table contains 103,129,000 rows.

The fast plan looks up by ClientId with a residual predicate on the date but needs to do 96 lookups to retrieve the Amount. The <ParameterList> section in the plan is as follows.

        <ParameterList>
          <ColumnReference Column="@dateend" 
                           ParameterRuntimeValue="'2013-02-01 23:59:00.000'" />
          <ColumnReference Column="@datebegin" 
                           ParameterRuntimeValue="'2013-01-01 00:00:00.000'" />
          <ColumnReference Column="@merchid" 
                           ParameterRuntimeValue="(78155)" />
        </ParameterList>

The slow plan looks up by date and has lookups to evaluate the residual predicate on ClientId and to retrieve the amount (Estimated 1 vs Actual 7,388,383). The <ParameterList> section is

        <ParameterList>
          <ColumnReference Column="@EndDate" 
                           ParameterCompiledValue="'2013-02-01 23:59:00.000'" 
                           ParameterRuntimeValue="'2013-02-01 23:59:00.000'" />
          <ColumnReference Column="@BeginDate" 
                           ParameterCompiledValue="'2013-01-01 00:00:00.000'"               
                           ParameterRuntimeValue="'2013-01-01 00:00:00.000'" />
          <ColumnReference Column="@ClientID" 
                           ParameterCompiledValue="(78155)" 
                           ParameterRuntimeValue="(78155)" />
        </ParameterList>

In this second case the ParameterCompiledValue is not empty. SQL Server successfully sniffed the values used in the query.

The book "SQL Server 2005 Practical Troubleshooting" has this to say about using local variables

Using local variables to defeat parameter sniffing is a fairly common trick, but the OPTION (RECOMPILE) and OPTION (OPTIMIZE FOR) hints ... are generally more elegant and slightly less risky solutions


Note

In SQL Server 2005, statement level compilation allows for compilation of an individual statement in a stored procedure to be deferred until just before the first execution of the query. By then the local variable's value would be known. Theoretically SQL Server could take advantage of this to sniff local variable values in the same way that it sniffs parameters. However because it was common to use local variables to defeat parameter sniffing in SQL Server 7.0 and SQL Server 2000+, sniffing of local variables was not enabled in SQL Server 2005. It may be enabled in a future SQL Server release though which is a good reason to use one of the other options outlined in this chapter if you have a choice.


From a quick test this end the behaviour described above is still the same in 2008 and 2012 and variables are not sniffed for deferred compile but only when an explicit OPTION RECOMPILE hint is used.

DECLARE @N INT = 0

CREATE TABLE #T ( I INT );

/*Reference to #T means this statement is subject to deferred compile*/
SELECT *
FROM   master..spt_values
WHERE  number = @N
       AND EXISTS(SELECT COUNT(*) FROM #T)

SELECT *
FROM   master..spt_values
WHERE  number = @N
OPTION (RECOMPILE)

DROP TABLE #T 

Despite deferred compile the variable is not sniffed and the estimated row count is inaccurate

Estimates vs Actual

So I assume that the slow plan relates to a parameterised version of the query.

The ParameterCompiledValue is equal to ParameterRuntimeValue for all of the parameters so this is not typical parameter sniffing (where the plan was compiled for one set of values then run for another set of values).

The problem is that the plan that is compiled for the correct parameter values is inappropriate.

You are likely hitting the issue with ascending dates described here and here. For a table with 100 million rows you need to insert (or otherwise modify) 20 million before SQL Server will automatically update the statistics for you. It seems that last time they were updated zero rows matched the date range in the query but now 7 million do.

You could schedule more frequent statistics updates, consider trace flags 2389 - 90 or use OPTIMIZE FOR UKNOWN so it just falls back on guesses rather than being able to use the currently misleading statistics on the datetime column.

This might not be necessary in the next version of SQL Server (after 2012). A related Connect item contains the intriguing response

Posted by Microsoft on 8/28/2012 at 1:35 PM
We've done a cardinality estimation enhancement for the next major release that essentially fixes this. Stay tuned for details once our previews come out. Eric

This 2014 improvement is looked at by Benjamin Nevarez towards the end of the article:

A First Look at the New SQL Server Cardinality Estimator.

It appears the new cardinality estimator will fall back and use average density in this case rather than giving the 1 row estimate.

Some additional details about the 2014 cardinality estimator and ascending key problem here:

New functionality in SQL Server 2014 – Part 2 – New Cardinality Estimation


So, my question is this... how can parameter sniffing be to blame when we get the same slow query on an empty plan cache... there shouldn't be any parameters to sniff?

When SQL Server compiles a query containing parameter values, it sniffs the specific values of those parameters for cardinality (row count) estimation. In your case, the particular values of @BeginDate, @EndDate and @ClientID are used when choosing an execution plan. You can find more details about parameter sniffing here and here. I am providing these background links because the question above makes me think the concept is imperfectly understood at present - there are always parameter values to sniff when a plan is compiled.

Anyway, that's all beside the point, because parameter sniffing is not the problem here as Martin Smith has pointed out. At the time the slow query was compiled, the statistics indicated there were no rows for the sniffed values of @BeginDate and @EndDate:

Slow plan sniffed values

The sniffed values are very recent, suggesting the ascending key problem Martin mentions. Because the index seek on the dates is estimated to return just a single row, the optimizer chooses a plan that pushes the predicate on ClientID to the Key Lookup operator as a residual.

The single row estimate is also the reason the optimizer stops looking for better plans, returning a Good Enough Plan Found message. The estimated total cost of the slow plan with the single row estimate is just 0.013136 cost units, so there is no point trying to find anything better. Except, of course, the seek actually returns 7,388,383 rows rather than one, causing the same number of Key Lookups.

Statistics can be tricky to keep up-to-date and useful on large tables and partitioning introduces challenges of its own in that regard. I have not had particular success myself with trace flags 2389 and 2390, but you are welcome to test them. More recent builds of SQL Server (R2 SP1 and later) have dynamic statistics updates available, but this per-partition statistics updates are still not implemented. In the meantime, you might like to schedule a manual statistics update whenever you make significant changes to this table.

For this particular query, I would think about implementing the index suggested by the optimizer during compilation of the fast query plan:

/*
The Query Processor estimates that implementing the following index could improve
the query cost by 98.8091%.

WARNING: This is only an estimate, and the Query Processor is making this 
recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide impact,
including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[PDetail] ([ClientID],[PostedDate])
INCLUDE ([Amount]);

The index should be partition-aligned, with an ON PartitionSchemeName (PostedDate)clause, but the point is that providing an obviously-best data access path will help the optimizer avoid poor plan choices, without resorting to OPTIMIZE FOR UNKNOWN hints or old-fashioned workarounds like using local variables.

With the improved index, the Key Lookup to retrieve the Amount column will be eliminated, the query processor can still perform dynamic partition elimination, and use a seek to find the particular ClientID and date range.