Unusual column comparisons and query performance

I don't like ISNULL with sentinel values, it requires picking values that can't ever legitimately appear in the data now or forever after and personally I find expressions containing these more difficult to reason about.

For your test rig I tried four different ways of expressing the query and got the stated results.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  r.hash <> l.hash
        OR ( r.hash IS NULL
             AND l.hash IS NOT NULL )
        OR ( l.hash IS NULL
             AND r.hash IS NOT NULL )

SQL Server Execution Times: CPU time = 30968 ms, elapsed time = 8230 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  ISNULL(r.hash, 0) <> ISNULL(l.hash, 0)

SQL Server Execution Times: CPU time = 31594 ms, elapsed time = 9230 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  EXISTS(SELECT r.hash
              EXCEPT
              SELECT l.HASH)

SQL Server Execution Times: CPU time = 46531 ms, elapsed time = 13191 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  NOT EXISTS(SELECT r.hash
                  INTERSECT
                  SELECT l.HASH) 

SQL Server Execution Times: CPU time = 23812 ms, elapsed time = 6760 ms.

So on that basis the last one would be a clear winner - along with a code comment with a link to Undocumented Query Plans: Equality Comparisons for anyone unfamiliar with the pattern.

But you should test whether this pattern is reproducible with your actual MERGE query too.


I would agree with your findings. The EXCEPT pattern that your consultants wrote handles NULL values, and is also a really nice way to manage a very large number of comparison columns.

But if the column cannot be NULL and performance suffers, I suppose there's no point in keeping it.

Your test could be inaccurate in the following regards:

  • The original queries a view, while your test queries a heap.
  • The original query may have a considerably different data volume, whereas you're trying 100 000 rows. Plans are generated differently, depending on the estimated number of rows they will result in.
  • Is the original column really varbinary(8000)? That sounds awfully wide for a hash value, and column widths affect memory grants, and ultimately performance.

This pattern may be a general solution for column comparisons, generated by a BIML package or similar ETL tool.