Inner join elimination inhibited by prior outer join

Many of the simplifications performed before cost-based optimization are targeted at generated queries (ORMs and the like). These queries often follow a pattern and result in logically redundant projections, selections, and joins.

There is a trade-off to be made here. Any number of rewrites and simplifications are logically possible. Each of these will need to be assessed against the current tree, and applied if the local circumstances are suitable. All this takes time and resources. Rules run before cost-based optimization are considered for every query, even ones with very little unoptimized cost, or which will qualify later for a trivial plan.

For those reasons, the optimizer team were careful to include here only rules with a relatively low cost (implementation and runtime), and high applicability.

Consider: Some rules are more difficult to implement than others. Some are more costly to evaluate than is justified by the potential gains. Some would introduce subtle bugs elsewhere in the optimizer code due to internal dependencies. Others are simply not common enough to make implementing them worthwhile. Still others would be easy to implement, would be commonly-enough useful, but weren't thought of at the time, and haven't been requested (loudly enough) since. For example, join elimination with multi-column relationships.

An example relevant to your question, using the same schema:

-- Join eliminated
SELECT SOD.ProductID 
FROM Sales.SalesOrderDetail AS SOD
LEFT JOIN Production.Product AS P
    ON P.ProductID = SOD.ProductID;

-- Join not eliminated projecting from the preserved side of the join
SELECT P.ProductID 
FROM Sales.SalesOrderDetail AS SOD
LEFT JOIN Production.Product AS P
    ON P.ProductID = SOD.ProductID;

The join is not eliminated there, though we might argue P.ProductID and SOD.ProductID are guaranteed identical in all respects by the logic and schema. More to the current point, the outer join in the second query is not converted to an inner join, which would allow the simplification targeted by the question.

Again, this is not because the SQL Server optimizer developers were stupid or lazy. This sort of thing just isn't common enough to be worthwhile checking for on every compilation.

In general, to get the best out of join simplification and elimination, you should construct written joins in a logical order (e.g. joined tables adjacent) and ensure the four conditions noted by Rob Farley are met.

Reordering joins

It is possible, but often complex and expensive, to move outer joins around other joins in some limited contexts. These transformations are tricky, so the vast majority of this type of effort is limited to the search 2 (full optimization) stage of cost-based optimization. Even so, relatively few of the logical possibilities here have been researched and/or implemented in SQL Server.

It is all too easy to change semantics unintentionally during transforms of this kind. For some introductory discussion see Be Careful When Mixing INNER and OUTER Joins by Jeff Smith. For more technical details, there are a wide range of technical papers, for example Outerjoin Simplification and Reordering for Query Optimization by César A. Galindo-Legaria (Microsoft) and Arnon Rosenthal.

Heuristic join reorder does make some efforts to reorganize cross joins, inner joins, and outer joins, but these efforts are very much at the lightweight end of the spectrum for all the reasons previously noted.

I'll leave you with this fun rewrite that does allow elimination:

SELECT p.[Name]
FROM Production.Product AS P
RIGHT JOIN Sales.SalesOrderDetail AS SOD
JOIN Sales.SalesOrderHeader AS SOH
    ON SOH.SalesOrderID = SOD.SalesOrderID
    ON SOD.ProductID = P.ProductID;

db<>fiddle demo


As Lennart mentioned:

You may find some interest in the following articles: https://dzone.com/articles/cool-sql-optimizations-that-do-not-depend-on-the-c and https://dzone.com/articles/cool-sql-optimizations-that-do-not-depend-on-the-c-1 It compares a number of DBMS (sql-server-2014 among others) for "algebraic" optimizations that do not rely on the cost-model.

Those are mostly accurate for SQL Server, with the exception of 4. Removing “Silly” Predicates, which doesn't reflect that SQL Server differentiates between EQ (equal, null-rejecting) and IS (null-aware) comparisons. To be clear, SQL Server does support this.


Yes, there are scenarios in which the join elimination phase doesn’t eliminate as much as it should. It often happens in scenarios where nulls are involved, including cases where things are null because of left joins. I remember several years ago, discussing with Paul White that you could help this scenario through using an explicit “AND SomeJoinColumn IS NOT NULL” (sod.SalesOrderID here, I guess). We were convinced it was a bug, but that it was unlikely to get onto Microsoft’s radar as it didn’t affect correctness. I can’t test it today, but have a look and see if that helps the elimination. I can always edit this answer later.

Oh, and when I first presented about this back in 2009, I hadn’t noticed this behaviour. When I became aware, it seemed like it was too much of an edge case to incorporate into my presentations.