Performance of a=0 and b=0 and ... z=0 vs a+b+c+d=0

In your question, you detail some tests that you've prepared where you "prove" that the addition option is quicker than comparing the discrete columns. I suspect your test methodology may be flawed in several ways, as @gbn and @srutzky have alluded to.

First, you need to ensure you're not testing SQL Server Management Studio (or whatever client you're using). For instance, if you are running a SELECT * from a table with 3 million rows, you're mostly testing SSMS's ability to pull rows from SQL Server and render them on-screen. You're far better off to use something like SELECT COUNT(1) which negates the need to pull millions of rows across the network, and render them on screen.

Second, you need to be aware of SQL Server's data cache. Typically, we test the speed of reading data from storage, and processing that data, from a cold-cache (i.e. SQL Server's buffers are empty). Occasionally, it makes sense to do all your testing with a warm-cache, but you need to approach your testing explicitly with that in mind.

For a cold-cache test, you need to run CHECKPOINT and DBCC DROPCLEANBUFFERS prior to each run of the test.

For the test you've asked about in your question, I created the following test-bed:

IF COALESCE(OBJECT_ID('tempdb..#SomeTest'), 0) <> 0
BEGIN
    DROP TABLE #SomeTest;
END
CREATE TABLE #SomeTest
(
    TestID INT NOT NULL
        PRIMARY KEY 
        IDENTITY(1,1)
    , A INT NOT NULL
    , B FLOAT NOT NULL
    , C MONEY NOT NULL
    , D BIGINT NOT NULL
);

INSERT INTO #SomeTest (A, B, C, D)
SELECT o1.object_id, o2.object_id, o3.object_id, o4.object_id
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3
    , sys.objects o4;

SELECT COUNT(1) 
FROM #SomeTest;

This returns a count of 260,144,641 on my machine.

To test the "addition" method, I run:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE (st.A + st.B + st.C + st.D) = 0;
GO
SET STATISTICS IO, TIME OFF;

The messages tab shows:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1313877, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 49047 ms, elapsed time = 173451 ms.

For the "discrete columns" test:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE st.A = 0
    AND st.B = 0
    AND st.C = 0
    AND st.D = 0;
GO

SET STATISTICS IO, TIME OFF;

again, from the messages tab:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1322661, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 8938 ms, elapsed time = 162581 ms.

From the stats above you can see the second variant, with the discrete columns compared to 0, the elapsed time is about 10 seconds shorter, and the CPU time is about 6 times less. The long durations in my tests above are mostly a result of reading a lot of rows from disk. If you drop the number of rows to 3 million, you see the ratios remain about the same but the elapsed times drop noticeably, since the disk I/O has much less of an effect.

With the "Addition" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 499 ms, elapsed time = 256 ms.

With the "discrete columns" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 94 ms, elapsed time = 53 ms.

What will make a really really big difference for this test? An appropriate index, such as:

CREATE INDEX IX_SomeTest ON #SomeTest(A, B, C, D);

The "addition" method:

Table '#SomeTest'. Scan count 3, logical reads 14235, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 546 ms, elapsed time = 314 ms.

The "discrete columns" method:

Table '#SomeTest'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

The execution plan for each query (with the above index in-place) is quite telling.

The "addition" method, which must perform a scan of the entire index:

enter image description here

and the "discrete columns" method, which can seek to the first row of the index where the leading index column, A, is zero:

enter image description here

Let's say you have an index on A, B, C and D. Could be filtered too.

This is more likely to use the index then addition.

Where A=0 and B=0 and C=0 and D=0

In other news, If A is -1 and B is 1, A+B=0 is true but A=0 and B=0 is false.

^{( Please note that this answer was submitted prior to any testing being noted in the Question: the text of the Question ended just above the Test results section. )}

I would guess that the separate AND conditions would be preferred since the optimizer would be more likely to short-circuit the operation if a single one of them is not equal to 0, without needing to do a computation first.

Still, since this is a question of performance, you should first set up a test to determine the answer on your hardware. Report those results, showing your test code, and ask for others to look it over to make sure it was a good test. There may be other factors worthy of consideration that you didn't think about.

Performance of a=0 and b=0 and ... z=0 vs a+b+c+d=0

Tags:

Performance

Sql Server

Query Performance

Related

Recent Posts