SQL Server: Accurate Row Estimates without Histogram?

First, as you noticed, SQL Server will automatically create missing statistics (where possible) when compiling an execution plan, if the database option AUTO_CREATE_STATISTICS is on.

When statistics are not available, the default guess for an unknown inequality predicate is 0.3 (30%). When you use a local variable, the value in the variable cannot be sniffed (unless OPTION (RECOMPILE) is also specified). With a table cardinality of 316, the estimate is 0.3 * 316 = 94.8.

When you use a constant literal, the value is sniffed. Without statistics, it cannot use this sniffed value to check the histogram as it normally would, but it can judge its impact on the CHECK constraint (the one limiting Rate values to between $6.50 and $200).

If the sniffed value does not exclude the check constraint range completely, the estimate is based on the 0.09 (9%) guessed selectivity for a BETWEEN predicate (from the check constraint). 316 * 0.09 = 28.44.

If the sniffed value does exclude the check constraint range completely, the estimate is always 1 row (the cardinality estimator (almost) never produces an estimated number of rows less than 1).

If the query is simple enough to qualify for a trivial plan, and it is considered safe for simple parameterization, the constant literal is replaced with a parameter marker e.g. @1. This happens for a query like:

SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200;

The execution plan shows an estimate of 1 row:

Plan

And the Scan Predicate shows a parameter marker:

Properties

If simple parameterization is prevented, e.g. by adding a clause that compares a constant with a constant:

SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200
AND 1 = 1;

Without parameterization, this plan can never be reused with a different value for the parameter, so the optimizer can statically eliminate table access on the basis that the check constraint prevents any rows being returned. The resuting plan is:

Constant scan

Finally, be careful with types. The type of the Rate column is money, not decimal. Conversions can affect cardinality estimation in complicated ways. Specify a money literal with the $ prefix, or use an explicit CAST or CONVERT.

The internal details of cardinality estimation are not documented publicly (probably to avoid endless ever more detailed questions and complaints when things change), but there are a range of resources out there to aid you in this area. Just remember most of this is unoffical and so not supported by anyone. Some aspects are actually dangerous.

Some apply only to the original (pre-2014) cardinality estimator, some explain general principles applicable to both, and some are applicable to the "new" CE only. The following is not meant as an authoritative or complete list, just the ones that came to mind immediately:

The SQL Server 2014 Cardinality Estimator
13 Things You Should Know About Statistics and the Query Optimizer
Cardinality Estimation (SQL Server)
Constant Folding and Expression Evaluation During Cardinality Estimation
Statistics Used by the Query Optimizer
Plan Caching and Recompilation
Nacho Alonso Portillo's blog (MSFT) (search for cardinality)
Dima Piliugin's blog
SQLblog.com posts
SQLperformance.com posts
Microsoft SQL Server Internals (book) Kalen Delaney et al.

SQL Server: Accurate Row Estimates without Histogram?

Tags:

Sql Server

Optimization

Related

Recent Posts