SQL WHERE ID IN (id1, id2, ..., idn)

What Ed Guiness suggested is really a performance booster , I had a query like this

select * from table where id in (id1,id2.........long list)

what i did :

DECLARE @temp table(
            ID  int
            )
insert into @temp 
select * from dbo.fnSplitter('#idlist#')

Then inner joined the temp with main table :

select * from table inner join temp on temp.id = table.id

And performance improved drastically.


Option 1 is the only good solution.

Why?

  • Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...

  • Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"


An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.

You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.

Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.


First option is definitely the best option.

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:

  • Divide you list of Ids into chunks of fixed number, say 100
  • Chunk size should be decided based upon the memory size of your server
  • Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
  • Process one chunk at a time resulting in 100 database calls for select

Why should you divide into chunks?

You will never get memory overflow exception which is very common in scenarios like yours. You will have optimized number of database calls resulting in better performance.

It has always worked like charm for me. Hope it would work for my fellow developers as well :)

Tags:

Sql

Select