Postgres sometimes uses inferior index for WHERE a IN (...) ORDER BY b LIMIT N

Why?

For a LIMIT 1, Postgres may estimate it to be faster to traverse the index supporting the ORDER BY and just keep filtering until the first row is found. This is fast as long as more than a few rows qualify and one of those pops up early according to ORDER BY. But it is (very) slow if no qualifying row pops up early, or even a worst case scenario if no row ends up qualifying at all. Similar for any small LIMIT.

Postgres collects statistics about the most common values (MCV list), but not for the least common ones - for obvious reasons, that would be far too many to be useful. And it has no statistics for correlations between columns by default. (While that can be created manually it won't fit your use case anyway, as ID numbers are typically uncorrelated.)

So Postgres has to base its decision on generic estimates. It's very hard to identify the sweet spot where to switch from one index to the other. This gets harder, yet, for a predicate like image_id IN (123, ... ) with many items, and most are typically rare or very rare or even non-existent. But if you put enough numbers into the list, Postgres will eventually expect that traversing the other index will find the first hit faster.

Solutions?

You may be able to improve the situation somewhat with a larger statistics target:

ALTER TABLE mcqueen_base_imagemeta2 ALTER image_id SET STATISTICS 2000;

That (among other things) increases the size of the MCV list for the column and help identify more (less) common values. But it's not a general solution for the problem, and makes ANALYZE and query planning a bit more expensive. Related:

  • Check statistics targets in PostgreSQL

Upgrading to the latest version (soon to be Postgres 12) also helps as general performance got better and the planner smarter.

There are various techniques for a workaround, depending on cardinalities, value frequencies, access patterns, ... Completely disabling the ORDER BY index like Laurenz demonstrated is one radical workaround - which can backfire for long lists or very common image_id, where the ORDER BY index would, in fact, be much faster.

Related:

  • Can spatial index help a "range - order by - limit" query

Workaround for your case

Should work well for the given numbers: 5 billion rows, around 20 image_id in the filter list, small LIMIT. Most efficient for LIMIT 1 and a short list, but good for any small LIMIT and manageable list size:

SELECT m.*
FROM   unnest( '{123, ...}'::bigint[]) i(image_id)
CROSS  JOIN LATERAL (
   SELECT m.id
   FROM   mcqueen_base_imagemeta2 m
   WHERE  m.image_id = i.image_id
   ORDER  BY m.id DESC
   LIMIT  1  -- or N
   ) m
ORDER  BY id DESC
LIMIT  1;  -- or N

Provide your list as array and unnest(). Or use a VALUES expression. Related:

  • Optimizing a Postgres query with a large IN

It's essential to support this with a multicolumn index on (image_id, id DESC)!

You might then delete the existing index mcqueen_base_imagemeta2_image_id_616fe89c on just (image_id). See:

  • Is a composite index also good for queries on the first field?

This should result in one very fast index(-only) scan per image_id. And a final, (very) cheap sort step.

Fetching N rows for each image_id guarantees that we have all rows needed in the outer query. If you have meta-knowledge that only fewer rows per single image_id can be in the result, you can decrease the nested LIMIT accordingly.

Aside

(a common pattern in Django pagination)

Pagination with LIMIT and OFFSET? OK for the first page, but after that it's just a bad idea.

  • Efficient pagination for big tables
  • What is the recommended way to join junction tables for efficient ordering/pagination?

It thinks it is going to find 78722, but it really finds 16, so that is going to lead to some bad plans.

When a value in the in-list is not present in the MCV list of the stats table, it guesses their frequency using the n_distinct value, which is probably way off (you didn't answer my question about that). The way it does this is to take the number of tuples not covered by the MCV frequency list and divides it by the number of distinct values not listed in the MCV list. So basically ntuples * (1-sum of MCF) / (n_distinct - length of MCF). This simplified formula ignores NULLs.

As @ErwinBrandstetter suggests, you might be able to improve the situation by increasing the size of the MCV list by increasing the statistics sample size. That might also increase the accuracy of the n_distinct estimate. But with 6 billions rows, it might not possible to increase the sample size by enough. Also, if image_id are clumped together with the duplicate values likely to occur in the same page, then the sampling method used by PostgreSQL is quite biased when it comes to computing n_distinct, and this is resistant to fixing by just increasing the sample size.

A simpler way to fix this may be to fix the n_distinct manually:

alter table mcqueen_base_imagemeta2 alter column image_id set (n_distinct=1000000000);
analyze mcqueen_base_imagemeta2;

This method doesn't increase the time or storage required by ANALYZE, the way increasing the sample size does, and is also more likely to succeed.


The simple solution is to modify the ORDER BY condition so that the semantics are unchanged, but PostgreSQL cannot use the index any more:

SELECT * FROM mcqueen_base_imagemeta2 
  WHERE image_id IN ( 123, ... )
  ORDER BY id + 0 DESC
  LIMIT 1;