Why SQL Server execution plan depends on comparison order

There are a number of issues here, but the most important is cardinality estimation (CE).

The newer ("default") CE model has a hard time with the predicates when it tries to compute selectivity against the histogram with no matching steps.

For example, the initial cardinality estimate returns a selectivity of 1 for:

select *
from dbo.tblEvent e
where
    1 = e.IntegrationEventStateId
    or 2 = e.IntegrationEventStateId
    or 5 = e.IntegrationEventStateId
    or (4 = e.IntegrationEventStateId and e.ModifiedDateUtc >= dateadd(minute, -5, getutcdate()))

as shown using trace flags 3604 and 2363:

Begin selectivity computation

Input tree:

  LogOp_Select
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)
      ScaOp_Logical x_lopOr
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)
          ScaOp_Logical x_lopAnd
              ScaOp_Comp x_cmpGe
                  ScaOp_Identifier QCOL: [e].ModifiedDateUtc
                  ScaOp_Identifier COL: ConstExpr1001 
              ScaOp_Comp x_cmpEq
                  ScaOp_Identifier QCOL: [e].IntegrationEventStateId
                  ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=4)

Plan for computation:

  CSelCalcCombineFilters_ExponentialBackoff (OR)
      CSelCalcCombineFilters_ExponentialBackoff (AND)
          CSelCalcColumnInInterval
              Column: QCOL: [e].ModifiedDateUtc
          CSelCalcColumnInInterval
              Column: QCOL: [e].IntegrationEventStateId
      CSelCalcColumnInInterval
          Column: QCOL: [e].IntegrationEventStateId

Loaded histogram for column QCOL: [e].ModifiedDateUtc from stats with id 3
Loaded histogram for column QCOL: [e].IntegrationEventStateId from stats with id 2

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=2, CARD=356525)
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)

End selectivity computation

When cost-based optimization starts, and the input tree is in a slightly different form, the CE is asked to compute the selectivity of the simpler predicates:

Begin selectivity computation

Input tree:

  LogOp_Select
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)
      ScaOp_Logical x_lopOr
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
          ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [e].IntegrationEventStateId
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=5)

Plan for computation:

  CSelCalcColumnInInterval
      Column: QCOL: [e].IntegrationEventStateId

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=3, CARD=356525)
      CStCollBaseTable(ID=1, CARD=356525 TBL: dbo.tblEvent AS TBL: e)

End selectivity computation

This is the equivalent of:

SELECT *
FROM dbo.tblEvent AS TE 
WHERE TE.IntegrationEventStateId IN (1, 2, 5);

In both cases, the CE assesses that 100% of the rows will match, despite there being no histogram steps for the values 1, 2, or 5 (the sample data has values of 3 only). It is tempting to blame the CSelCalcColumnInInterval calculator for this, as it seems to treat {1, 2, 5} as a single interval {1:5}.

As is often the case, the "legacy" CE does a better (more detailed) job here, so you should find the following hint will produce much better plans:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

With the repro data, this produces a single seek and key lookup as one would hope.

"legacy" CE plan

Note that the seek performs four seeking operations, one for each disjoint predicate.

[1] Seek Keys[1]: Prefix: IntegrationEventStateId = 1
[2] Seek Keys[1]: Prefix: IntegrationEventStateId = 2
[3] Seek Keys[1]: Prefix: IntegrationEventStateId = 4, Start: ModifiedDateUtc >= dateadd(minute,(-5),getutcdate())
[4] Seek Keys[1]: Prefix: IntegrationEventStateId = 5

The new CE is designed to be more predictable, and easier to maintain/extend than the original CE. The "legacy" one had bits bolted on to it and refinements made over a long period of time. That complexity has benefits and pitfalls. Regressions and lower-quality estimates are somewhat expected with the newer CE. This ought to improve over time, but we are not there yet. I would view the behaviour shown here as a limitation of the calculator. Perhaps they will fix it.

See Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

The question of why the plan shape depends on textual representation is more of a side issue. The compilation process does contain logic (e.g. rule SelPredNorm) to rewrite predicates into a normalized form, and both repro queries are successfully rewritten to the same tree. This is done for various internal purposes, including index and computed column matching, and to make logical simplification easier to process.

Unfortunately, the rewritten form is only used prior to cost-based optimization. The input to the cost-based optimizer retains the differences in textual order present in the original query. I believe this is intentional, and done to prevent unexpected plan changes. People sometimes write queries in slightly different, and unusual, ways to achieve a particular plan shape. If the optimizer suddenly started defeating those attempts as logically redundant, people would be upset. This is arguably less of an issue with things like query store, and more efficient plan forcing, but these are relatively recent innovations.

In other words, the plans are different because people have relied on different text producing different plans in the past, and changing that now would be too disruptive.

Why SQL Server execution plan depends on comparison order

Tags:

Sql Server

Sql Optimization

Related

Recent Posts