Optimizing a Postgres query with a large IN

Instead of using a huge IN-list, join on a VALUES expression, or if the list is large enough, use a temp table, index it, then join on it.

It'd be nice if PostgreSQL could do this internally & automatically but at this point the planner doesn't know how.

Similar topics:

  • https://stackoverflow.com/q/24647503/398670
  • https://stackoverflow.com/q/17813492/398670

There are actually two different variants of the IN construct in Postgres. One works with a subquery expression (returning a set), the other one with a list of values, which is just shorthand for

expression = value1
OR
expression = value2
OR
...

You are using the second form, which is fine for a short list, but much slower for long lists. Provide your list of values as subquery expression instead. I was recently made aware of this variant:

WHERE "Post"."userId" IN (VALUES (201486), (1825186), (998608), ... )

I like to pass an array, unnest and join to it. Similar performance, but the syntax is shorter:

...
FROM   unnest('{201486,1825186,998608, ...}'::int[]) "userId"
JOIN   "Posts" "Post" USING ("userId")

Equivalent as long as there are no duplicates in the provided set / array. Else the second form with a JOIN returns duplicate rows, while the first with IN only returns a single instance. This subtle difference causes different query plans, too.

Obviously, you need an index on "Posts"."userId".
For very long lists (thousands), go with an indexed temp table like @Craig suggested. This allows combined bitmap index scans over both tables, which is typically faster as soon as there are multiple tuples per data page to fetch from disk.

Related:

  • How to use ANY instead of IN in a WHERE clause with Rails?

Aside: your naming convention is not very helpful, makes your code verbose and hard to read. Rather use legal, lower-case, unquoted identifiers.