Index that is not used, yet influences query

When you create an expression index, it causes PostgreSQL to gather statistics on the that expression. With those statistics on hand, it now has an accurate estimate for the number of aggregated rows that the query will return, which leads it to make a better plan choice.

Specifically in this case, without those extra statistics it thought the hash table would be too large to fit in work_mem, so it didn't choose that method.


VACUUM ANALYZE makes the difference in your example. Plus, as @jjanes supplied, the additional statistics for the functional index. Per documentation:

pg_statistic also stores statistical data about the values of index expressions. These are described as if they were actual data columns; in particular, starelid references the index. No entry is made for an ordinary non-expression index column, however, since it would be redundant with the entry for the underlying table column.

However, creating the index does not by itself cause Postgres to gather statistics. Try:

CREATE INDEX myindex ON mytable ((myid/100000));
SELECT * FROM pg_statistic WHERE starelid = 'myindex'::regclass;

Returns nothing until you run your first ANALYZE (or VACUUM ANALYZE, or the autovacuum daemon kicks in).

ANALYZE mytable;
SELECT * FROM pg_statistic WHERE starelid = 'myindex'::regclass;

Now you'll see added statistics.

Since the whole table has to be read anyway, Postgres is going to use a sequential scan unless it expects the computation of myid/100000 to be expensive enough to switch, which it isn't.

Your only other chance would be an index-only scan if the index is much smaller than the table - and preconditions for an index-only scan are met. Details in the Postgres Wiki and in the manual.

As long as that functional index is not used, the collateral benefit from added statistics is moderate. If the table was read-only the cost would be low - but then again, we'd probably see an index-only scan right away.

Maybe you can also achieve better query plans by setting a higher statistics target for mytable.myid. That would only incur a minor cost. More: