Wrong estimate on a query on partitioned tables

The estimates (with the new cardinality estimator) are fine for a normal join, but are less accurate when the optimizer considers the option of a colocated join.

A colocated join (aka per-partition join) is available when joining two tables that are partitioned in the same way. The idea is to join one partition at a time, using nested loops apply driven by partition ids provided by a constant scan (in-memory table of values).

Regular join

Since the colocated join involves a nested loops apply, you can force the optimizer to avoid this by specifying OPTION (HASH JOIN) for example:

plan with hash join forced

The two seeks in that plan are:

Seek Keys[1]: Prefix:
    PtnId1000, [dbo].[A].DateKey = Scalar Operator((3)), Scalar Operator((20140802))
Seek Keys[1]: Prefix:
    PtnId1003, [dbo].[B].DateKey = Scalar Operator((3)), Scalar Operator((20140802))

The optimizer has applied static partition elimination in both cases, giving accurate estimates for both seeks, and the following join.

Colocated join

When the optimizer considers a colocated join (as shown in the question), the seeks are:

colocated join plan

Seek Keys[1]: Prefix:
    PtnId1000, [dbo].[A].DateKey = Scalar Operator([Expr1006]), Scalar Operator((20140802))
Seek Keys[1]: Prefix:
    PtnId1003, [dbo].[B].DateKey = Scalar Operator([Expr1006]), Scalar Operator((20140802))

...where [Expr1006] is the value returned by the Constant Scan operator.

The cardinality estimator now cannot see that the DateKey value and the partition id are interdependent, as it could when literal constants were used. In other words, it is not apparent to the estimator that the value inside [Expr1006] specifies the same partition as DateKey = 20140802.

As a consequence, the CE chooses (by default) to estimate the selectivity of the two (apparently independent) predicates using the normal exponential backoff method.

This explains the reduced cardinality estimates feeding the join. The lower apparent cost of this option (due to the misestimate) means the optimizer chooses a colocated join instead of a regular join, even though it is obvious (to humans) that it offers no value.

There are several ways to work around this gap in the logic, including using the query hint USE HINT ('ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES'), but this will affect the whole query, not just the problematic colocated join alternative. As Erik notes in his answer, you could also hint the use of the legacy CE.

For more information about colocated joins, see my article Improving Partitioned Table Join Performance


This appears to be due to the new cardinality estimator introduced in SQL Server 2014.

If you instruct the query to use the old one, you get a different plan and correct estimates.

SELECT a.DateKey, a.Type
FROM A AS a
  JOIN B AS b
    ON b.DateKey = a.DateKey
    AND b.Type = a.Type
WHERE a.DateKey = 20140802
OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

NUTS

See these links for more information:

  • Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator
  • SQL Server Join Estimation using Histogram Coarse Alignment