Do covering indexes in PostgreSQL help JOIN columns?

Will the index be used as a covering index to help the JOIN in the above query?

It depends. Postgres has "index-only" scans as index access method, there are no "covering indexes" per se - up to Postgres 10.

Starting with Postgres 11 true covering indexes with INCLUDE columns are available. Blog entry by Michael Paquier introducing the feature:

https://paquier.xyz/postgresql-2/postgres-11-covering-indexes/

Related answer with code example:

Does a query with a primary key and foreign keys run faster than a query with just primary keys?

That said, the index CREATE INDEX ON table2 (t2c1, t1); makes perfect sense for the query you demonstrate. It can be used for an index-only scan if additional preconditions are met, or it can be used in a bitmap index scan or a plain index scan. Related:

Index usage on a temporary table
Is a composite index also good for queries on the first field?

JOIN conditions and WHERE conditions are almost completely equivalent in Postgres. They certainly can use indexes in the same way. You can rewrite your query:

SELECT table1.t1c1
FROM   table1
JOIN   table2 ON table2.t1 = table1.id
WHERE  table2.t2c1 = 42;

With this equivalent:

SELECT table1.t1c1
FROM   table1 CROSS JOIN table2
WHERE  table2.t1 = table1.id
AND    table2.t2c1 = 42;

The first form is obviously preferable, though. Easier to read.

Why "almost" equivalent? (Makes no difference for the simple query at hand.)

Why does this implicit join get planned differently than an explicit join?

Are implicit joins as efficient as explicit joins in Postgres?
What does [FROM x, y] mean in Postgres?

Will the index be used as a covering index to help the JOIN in the above query? Should I change my index writing strategy to cover foreign key columns?

Not likely in the above query. This is a deceiving complex problem with the results based on the estimates and selectivity of the two conditions,

table2.t1 = table1.id
t2c1 = 42

Essentially, you want to throw both of the environments (row counts) to make it so both conditions have more or less selectivity. And if you get a nested-loop, you want to increase the raw amount until that's no longer the most viable join method.

CREATE TABLE table1(
   id INTEGER PRIMARY KEY,
   t1c1 INTEGER,
   t1c2 INTEGER
);
INSERT INTO table1(id, t1c1, t1c2)
  SELECT x,x,x FROM generate_series(1,1000)
  AS gs(x);

CREATE TABLE table2(
  id INTEGER PRIMARY KEY,
  t1 INTEGER REFERENCES table1(id),
  t2c1 INTEGER
);
INSERT INTO table2(id, t1, t2c1)
SELECT x,1+x%1000,x%50 FROM generate_series(1,1e6)
  AS gs(x);

EXPLAIN ANALYZE
  SELECT t1c1
  FROM table1
  JOIN table2 ON table2.t1 = table1.id
  WHERE t2c1 = 42;

Now check the plan.

Now create the compound index,

CREATE INDEX ON table2 (t2c1, t1);
VACUUM FULL ANALYZE table1;
VACUUM FULL ANALYZE table2;

And check the plan again,

EXPLAIN ANALYZE
  SELECT t1c1
  FROM table1
  JOIN table2 ON table2.t1 = table1.id
  WHERE t2c1 = 42;

You can drop the keys and such to find which form it prefers

CREATE INDEX ON table2 (t1, t2c1);

CREATE INDEX ON table2 (t2c1, t1);

Ultimately though this is a lot of work, I suggest starting off with

CREATE INDEX ON table2 (t1);
CREATE INDEX ON table2 (t2c1);

And optimizing only if that isn't sufficient.

You can also disable specific planner options to see if another plan really is faster or slower, and then look to fixing that but that can also be a lot of work.

Do covering indexes in PostgreSQL help JOIN columns?

Tags:

Postgresql

Optimization

Index

Related

Recent Posts