Refreshing materialized view CONCURRENTLY causes table bloat

First, let's explain the bloat

REFRESH MATERIALIZED CONCURRENTLY is implemented in src/backend/commands/matview.c, and the comment is enlightening:

/*
 * refresh_by_match_merge
 *
 * Refresh a materialized view with transactional semantics, while allowing
 * concurrent reads.
 *
 * This is called after a new version of the data has been created in a
 * temporary table.  It performs a full outer join against the old version of
 * the data, producing "diff" results.  This join cannot work if there are any
 * duplicated rows in either the old or new versions, in the sense that every
 * column would compare as equal between the two rows.  It does work correctly
 * in the face of rows which have at least one NULL value, with all non-NULL
 * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
 * indexes turns out to be quite convenient here; the tests we need to make
 * are consistent with default behavior.  If there is at least one UNIQUE
 * index on the materialized view, we have exactly the guarantee we need.
 *
 * The temporary table used to hold the diff results contains just the TID of
 * the old record (if matched) and the ROW from the new table as a single
 * column of complex record type (if matched).
 *
 * Once we have the diff table, we perform set-based DELETE and INSERT
 * operations against the materialized view, and discard both temporary
 * tables.
 *
 * Everything from the generation of the new data to applying the differences
 * takes place under cover of an ExclusiveLock, since it seems as though we
 * would want to prohibit not only concurrent REFRESH operations, but also
 * incremental maintenance.  It also doesn't seem reasonable or safe to allow
 * SELECT FOR UPDATE or SELECT FOR SHARE on rows being updated or deleted by
 * this command.
 */

So the materialized view is refreshed by deleting rows and inserting new ones from a temporary table. This can of course lead to dead tuples and table bloat, which is confirmed by your VACUUM (VERBOSE) output.

In a way, that's the price you pay for CONCURRENTLY.

Second, let's debunk the myth that VACUUM cannot remove the dead tuples

VACUUM will remove the dead rows, but it cannot reduce the bloat (that can be done with VACUUM (FULL), but that would lock the view just like REFRESH MATERIALIZED VIEW without CONCURRENTLY).

I suspect that the query you use to determine the number of dead tuples is just an estimate that gets the number of dead tuples wrong.

An example that demonstrates all that

CREATE TABLE tab AS SELECT id, 'row ' || id AS val FROM generate_series(1, 100000) AS id;

-- make sure autovacuum doesn't spoil our demonstration
CREATE MATERIALIZED VIEW tab_v WITH (autovacuum_enabled = off)
AS SELECT * FROM tab;

-- required for CONCURRENTLY
CREATE UNIQUE INDEX ON tab_v (id);

Use the pgstattuple extension to accurately measure table bloat:

CREATE EXTENSION pgstattuple;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 100000
tuple_len          | 3788895
tuple_percent      | 85.49
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 16724
free_percent       | 0.38

Now let's delete some rows in the table, refresh and measure again:

DELETE FROM tab WHERE id BETWEEN 40001 AND 80000;

REFRESH MATERIALIZED VIEW CONCURRENTLY tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 40000
dead_tuple_len     | 1520000
dead_tuple_percent | 34.3
free_space         | 16724
free_percent       | 0.38

Lots of dead tuples. VACUUM gets rid of these:

VACUUM tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 1616724
free_percent       | 36.48

The dead tuples are gone, but now there is a lot of empty space.


I'm adding to @Laurenz Albe full answer provided above. There is another possibility for the bloating. Consider the following scenario:

You have a view that rarely changes in most (1000000 records, 100 records change per request) and yet you still get 500000 dead tuples. The reason for that can be null in the index column.

As described in the answer above when views materialized concurrently, a copy is recreated and compared. The comparison uses the mandatory unique index but, what about nulls? nulls are never equal to each other in sql. So if you primary key allow nulls, the records that include nulls even if unchanged will be always recreated and added to the table

In order to fix this what you can do to remove the bloat is to add additional column that will coalesce the null column to some never used value (-1, to_timestamp(0), ...) and use this column only for the primary index