Difference between Hash, Merge and Loop join?

From MSDN, in the topic of Advanced Query Tuning Concepts:

SQL Server employs three types of join operations:

  • Nested loops joins

  • Merge joins

  • Hash joins

If one join input is small (fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons. For more information about nested loops, see Understanding Nested Loops Joins.

If the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), a merge join is the fastest join operation. If both join inputs are large and the two inputs are of similar sizes, a merge join with prior sorting and a hash join offer similar performance. However, hash join operations are often much faster if the two input sizes differ significantly from each other. For more information, see Understanding Merge Joins.

Hash joins can efficiently process large, unsorted, non indexed inputs.

But I believe that you should start with a more basic topic: Query Tuning and lastly go to using the query hints.


This article explains it well: https://www.linkedin.com/pulse/loop-hash-merge-join-types-eitan-blumin

(Assume N and M are the number of rows in the two tables being joined).

Nested Loop Join

  • Complexity: O(NlogM)
  • Used usually when one table is significantly small
  • The larger table has an index which allows seeking it using the join key

Merge Join

  • Complexity: O(N+M)
  • Both inputs are sorted on the join key
  • An equality operator is used
  • Excellent for very large tables

Hash Join

  • Complexity: O(Nhc+Mhm+J) or O(N+M) if you ignore resource consumption costs
  • Last-resort join type
  • Uses a hash table and a dynamic hash match function to match rows
  • Higher cost in terms of memory consumption and disk I/O utilization.

Tags:

Sql Server