Why does nested loops join only support left joins?

What I don't like in the linked article is the statement that "nested loop join algorithm does not support the logical join operator of right join".

While there is a limitation, the wording at this point is a bit confusing. I hope the following explains things better:

The nested lop join algorithm involves two tables (whether base tables or result sets of previous operations is irrelevant) which are named outer and inner table and they are treated in a different way by the algorithm (the "outer" table is traversed at the outer loop and the "inner" table at the inner loops).

So, lets say we have have a join:

A (some_type) JOIN B

The algorithm can be executed as either:

outer-loop-A  nested-loop  inner-loop-B

or:

outer-loop-B  nested-loop  inner-loop-A

Now, if (some_type) is INNER or CROSS join, then there is no limitation, the planner can choose between either of the two ways (with different performance characteristics, depending on the size of the sets, distribution of values of the joined columns, indexes, etc. Usually the smallest table will be chosen to be the "outer" table in the algorithm).

But when some_type is LEFT join, it can only use:

outer-loop-A  nested-loop  inner-loop-B

and not

outer-loop-B  nested-loop  inner-loop-A

And since a RIGHT can always be rewritten as a LEFT join, it has the same limitation, in reverse. For A RIGHT JOIN B (which can be rewritten a B LEFT JOIN A) it can only use:

outer-loop-B  nested-loop  inner-loop-A

and not the other way around^*.

The same limitation applies for left-semijoin, left-anti-semijoin, right-semijoin and right-anti-semijoin.

The FULL join on the other hand cannot be directly handled with a nested loop join algorithm. The article explains very well (it's near the end) how a full join can be rewritten (and is by the optimizer) to a union of a left join and a left anti-semijoin which then might be planned as two nested loops (and a union).

^* _{As Dudu Markovitz explains in his answer, the reverse way would be able to be used but only if we modified the nested-loop join algorithm to have an extra structure and an extra step in the end.}

The main issue here is the implementation of an outer join, using nested loops, in a technical way which is opposite to the logical way, where the inner table is accessed through the outer loop and the outer table is accessed through the inner loop.

Given tables A and B, let's implement A LEFT JOIN B.

A
--
1
2

B
_
1
3

First, let's do it in the "natural" way.

We iterate through A.
We access record 1.
We iterate through B.
We find record 1 in B and output 1-1.

We keep iterating through A.
We access record 2.
We iterate through B.
We don't find any match in B.
We output 2-null.

Now, let's do it in the "opposite" way.

We iterate through B.
We access record 1.
We iterate through A.
We find record 1 in A and output 1-1.

We keep iterating through B.
We access record 3.
We iterate through A.
We don't find any match in A.

Now remember that it was A LEFT JOIN B, which means that in addition to 1-1 we should output 2-null.
The problem is that at that point, we have no idea for which records id A we already have a match (1) and for which records we don't (2).

This can actually be solved in various ways e.g. by holding a bit array for table A.
When an A record is being found as a match we mark it in the bit array.
At the end of the nested loops we are going through the bit array and output and output any record that was not marked.
This is obviously more complicated than the "natural" nested loop.

Why does nested loops join only support left joins?

Tags:

Sql Server

Join

Database Internals

Related

Recent Posts