Quick way to validate two tables against each other

Here's what I've done before:

(SELECT 'TableA', * FROM TableA
EXCEPT
SELECT 'TableA', * FROM TableB)
UNION ALL
(SELECT 'TableB', * FROM TableB
EXCEPT
SELECT 'TableB', * FROM TableA)

It's worked well enough on tables that are about 1,000,000 rows, but I'm not sure how well that would work on extremely large tables.

Added:

I've run the query against my system which compares two tables with 21 fields of regular types in two different databases attached to the same server running SQL Server 2005. The table has about 3 million rows, and there's about 25000 rows different. The primary key on the table is weird, however, as it's a composite key of 10 fields (it's an audit table).

The execution plans for the queries has a total cost of 184.25879 for UNION and 184.22983 for UNION ALL. The tree cost only differs on the last step before returning rows, the concatenation.

Actually executing either query takes about 42s plus about 3s to actually transmit the rows. The time between the two queries is identical.

Second Addition:

This is actually extremely fast, each one running against 3 million rows in about 2.5s:

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableA

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableB

If the results of those don't match, you know the tables are different. However, if the results do match, you're not guaranteed that the tables are identical because of the [highly unlikely] chance of checksum collisions.

I'm not sure how datatype changes between tables would affect this calculation. I would run the query against the system views or information_schema views.

I tried the query against another table with 5 million rows and that one ran in about 5s, so it appears to be largely O(n).


Here are several ideas that might help:

  1. Try different data diff tool - have you tried Idera's SQL Comparison toolset or ApexSQL Data Diff. I realize that you already paid for RG but you can still use these in trial mode to get the job done ;).

  2. Divide and conquer - how about splitting tables into 10 smaller tables that can be handles by some commercial data comparison tool?

  3. Limit yourself only to some columns - do you really need to compare data in all columns?


I believe you should investigate BINARY_CHECKSUM, although I would opt for the Red Gate tool:

http://msdn.microsoft.com/en-us/library/ms173784.aspx

Something like this:

SELECT BINARY_CHECKSUM(*) from myTable;