How to make use of indexes with inner join in Postgis?

Firstly, as has been noted in the comments, the leading underscore before ST function, ie, _ST_3DWithin will lead to the index not being used. I can't find any recent mention of this, but in older docs if you search for, eg, _ST_Intersects it states:

To avoid index use, use the function _ST_Intersects.

EDIT: As clarified by @dbaston in the comments, the functions with the leading underscore are internal functions that do not use the index when called and this continues to be the case (although it is hard to find in the docs).

Your query could possibly benefit from the LATERAL JOIN syntax, which lends itself well to k nearest neighbour (kNN) problems like this one.

SELECT 
   a.a_id, 
   b.b_id
   b.height - ST_3Ddistance(b.geom, a.geom) AS fall,
  FROM table_a a
     LEFT JOIN LATERAL
       (SELECT
            b_id,         
            geom,
            height        
          FROM table_b
          WHERE ST_3Ddwithin(a.geom, geom, 50)
          AND height - ST_3Ddistance(geom, a.geom) > 0
          ORDER BY height - ST_3Ddistance(b.geom, a.geom) DESC 
          LIMIT 1
        ) b ON TRUE;

This allows you to find the nearest k geometries from table a (in this case 1, due to LIMIT 1) to table b, ordered by the 3D distance between them. It is written using a LEFT JOIN, as it is conceivable that there might be some geometries in table a that are not within 50 meters of table b.

The lateral queries allow you to reference columns from the previous FROM clause, which makes it more powerful than standard sub queries, see the docs.

I can't test this against your data, but when I have run similar queries, the EXPLAIN statement indicates proper index use.


This link to PostGIS documentation recommends the following steps in order to ensure indexes and query planner are optimized:

  1. Make sure statistics are gathered about the number and distributions of values in a table, to provide the query planner with better information to make decisions around index usage. VACUUM ANALYZE will compute both.

  2. If vacuuming does not help, you can temporarily force the planner to use the index information by using the set enable_seqscan to off; command. This way you can check whether planner is at all capable to generate an index accelerated query plan for your query. You should only use this command only for debug: generally speaking, the planner knows better than you do about when to use indexes. Once you have run your query, do not forget to set ENABLE_SEQSCAN back on, so that other queries will utilize the planner as normal.

  3. If set enable_seqscan to off; helps your query to run, your Postgres is likely not tuned for your hardware. If you find the planner wrong about the cost of sequential vs index scans try reducing the value of random_page_cost in postgresql.conf or using set random_page_cost to 1.1;. Default value for the parameter is 4, try setting it to 1 (on SSD) or 2 (on fast magnetic disks). Decreasing the value makes the planner more inclined of using Index scans.

  4. If set enable_seqscan to off; does not help your query, it may happen you use a construction Postgres is not yet able to untangle. A subquery with inline select is one example - you need to rewrite it to the form planner can optimize, say, a LATERAL JOIN.

So, first try steps 1-3 before rewriting your query to use the indices. If that doesn't work, you could try to modify the query.

I believe (to the best of my ability to whip up SQL without running the code) that the query below will return identical results to yours, but don't know if it will be more efficient.

SELECT DISTINCT on (a_id),
    table_b.b_id as b_id,
    table_b.height - st_3ddistance(table_b.geom, table_a.geom) as fall,
    table_b.geom as b_geom,
    table_a.a_id as a_id
    FROM table_a
         INNER JOIN table_b ON _st_3ddwithin(table_a.geom, table_b.geom, 50)) a
WHERE fall >= 0
ORDER BY a_id, fall DESC;