Best practice between using LEFT JOIN or NOT EXISTS

The biggest difference is not in the join vs not exists, it is (as written), the SELECT *.

On the first example, you get all columns from both A and B, whereas in the second example, you get only columns from A.

In SQL Server, the second variant is slightly faster in a very simple contrived example:

Create two sample tables:

CREATE TABLE dbo.A
(
    A_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);

CREATE TABLE dbo.B
(
    B_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);
GO

Insert 10,000 rows into each table:

INSERT INTO dbo.A DEFAULT VALUES;
GO 10000

INSERT INTO dbo.B DEFAULT VALUES;
GO 10000

Remove every 5th row from the second table:

DELETE 
FROM dbo.B 
WHERE B_ID % 5 = 1;

SELECT COUNT(*) -- shows 10,000
FROM dbo.A;

SELECT COUNT(*) -- shows  8,000
FROM dbo.B;

Perform the two test SELECT statement variants:

SELECT *
FROM dbo.A
    LEFT JOIN dbo.B ON A.A_ID = B.B_ID
WHERE B.B_ID IS NULL;

SELECT *
FROM dbo.A
WHERE NOT EXISTS (SELECT 1
    FROM dbo.B
    WHERE b.B_ID = a.A_ID);

Execution plans:

enter image description here

The second variant does not need to perform the filter operation since it can use the left anti-semi join operator.


Logically they are identical, but NOT EXISTS is closer to the AntiSemiJoin that you're asking for, and is generally preferred. It also highlights better that you can't access the columns in B, because it's only used as a filter (as opposed to having them available with NULL values).

Many years ago (SQL Server 6.0 ish), LEFT JOIN was quicker, but that hasn't been the case for a very long time. These days, NOT EXISTS is marginally faster.


The biggest impact in Access is that the JOIN method has to complete the join before filtering it, constructing the joined set in memory. Using NOT EXISTS it checks for the row but doesn't allocate space for the columns. Plus, it stops looking once it finds a row. Performance varies a bit more in Access, but a general rule of thumb is that NOT EXISTS tends to be a little faster. I'd be less inclined to say it's "best practice", as there are more factors involved.


An exception I've noticed to the NOT EXISTS being superior (however marginally) to LEFT JOIN ... WHERE IS NULL is when using Linked Servers.

From examining the execution plans, it appears that NOT EXISTS operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

Example execution plan demonstrating this behaviour: enter image description here