PostgreSQL query very slow when subquery added

The core of the problem becomes obvious here:

Seq Scan on publication (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1)

Postgres estimates to return 744661 rows while, in fact, it turns out to be a single row. If Postgres does not know better what to expect from the query it cannot plan better. We would need to see the actual query hidden behind (SELECT 9762715) - and probably also know table definition, constraints, cardinalities and data distribution. Obviously, Postgres is not able to predict how few rows will be returned by it. There may be ways to rewrite the query, depending on what it is.

If you know that the subquery can never return more than n rows, you can just tell Postgres by using:

SELECT mtid
FROM   publication
WHERE  mtid IN (SELECT ... LIMIT n) --  OR last_modifier=21321
LIMIT  5000;

If n is small enough, Postgres will switch to (bitmap) index scans. However, that only works for the simple case. Stops working when adding an OR condition: the query planner can't currently cope with that.

I rarely use IN (SELECT ...) to begin with. Typically there is a better way to implement the same, often with an EXISTS semi-join. Sometimes with a (LEFT) JOIN (LATERAL) ...

The obvious workaround would be to use UNION, but you ruled that out. I can't say more without knowing the actual subquery and other relevant details.


My colleague has found a way to change the query so that it needs a simple rewrite and does what it needs to do, i.e. doing the subselect in one step, and then doing the further operations on the result:

SELECT mtid FROM publication 
WHERE 
  mtid = ANY( (SELECT ARRAY(SELECT 9762715))::bigint[] )
  OR last_modifier=21321
LIMIT 5000;

The explain analyze now is:

 Limit  (cost=92.58..9442.38 rows=2478 width=8) (actual time=0.071..0.074 rows=1 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.01..0.02 rows=1 width=0) (actual time=0.010..0.011 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
   ->  Bitmap Heap Scan on publication  (cost=92.56..9442.36 rows=2478 width=8) (actual time=0.069..0.070 rows=1 loops=1)
         Recheck Cond: ((mtid = ANY (($1)::bigint[])) OR (last_modifier = 21321))
         Heap Blocks: exact=1
         ->  BitmapOr  (cost=92.56..92.56 rows=2478 width=0) (actual time=0.060..0.060 rows=0 loops=1)
               ->  Bitmap Index Scan on publication_pkey  (cost=0.00..44.38 rows=10 width=0) (actual time=0.046..0.046 rows=1 loops=1)
                     Index Cond: (mtid = ANY (($1)::bigint[]))
               ->  Bitmap Index Scan on publication_last_modifier_btree  (cost=0.00..46.94 rows=2468 width=0) (actual time=0.011..0.011 rows=0 loops=1)
                     Index Cond: (last_modifier = 21321)
 Planning time: 0.704 ms
 Execution time: 0.153 ms

It seems we can create a simple parser that finds and rewrites all subselects this way, and add it to a hibernate hook to manipulate the native query.