Why can't rows inserted in a CTE be updated in the same statement?

All statements in a CTE happen virtually at the same time. I.e., they are based on the same snapshot of the database.

The UPDATE sees the same state of the underlying table as the INSERT, which means the row with val = 1 is not there, yet. The manual clarifies here:

All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" one another's effects on the target tables.

Each statement can see what's returned by another CTE in the RETURNING clause. But the underlying tables look all the same to them.

You would need two statements (in a single transaction) for what you are trying to do. The given example should really just be a single INSERT to begin with, but that may be due to the simplified example.

This is an implementation decision. It is described in Postgres documentation, WITH Queries (Common Table Expressions). There are two paragraphs related to the issue.

First, the reason for the observed behaviour:

The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query. An example of this is that in ...

After I posted a suggestion along to pgsql-docs, Marko Tiikkaja explained (which agrees with Erwin's answer):

The insert-update and insert-delete cases don't work because the UPDATEs and DELETEs have no way of seeing the INSERTed rows due to their snapshot having been taken before the INSERT happened. There is nothing unpredictable about these two cases.

So the reason why your statement does not update can be explained by the first paragraph above (about "snapshots"). What happens when you have modifying CTEs is that all of them and the main query are executed and "see" the same snapshot of the data (tables), as they were immediately before the statement execution. CTEs can pass information about what they inserted/updated/deleted to one another and to the main query by using the RETURNING clause but they can't see the changes in the tables directly. So lets see what happens in your statement:

WITH newval AS (
    INSERT INTO tbl(val) VALUES (1) RETURNING id
) UPDATE tbl SET val=2 FROM newval WHERE tbl.id=newval.id;

We have 2 parts, the CTE (newval):

-- newval
     INSERT INTO tbl(val) VALUES (1) RETURNING id

and the main query:

-- main 
UPDATE tbl SET val=2 FROM newval WHERE tbl.id=newval.id

The flow of execution is something like this:

           initial data: tbl
                id │ val 
                 (empty)
               /         \
              /           \
             /             \
    newval:                 \
       tbl (after newval)    \
           id │ val           \
            1 │   1           |
                              |
    newval: returns           |
           id                 |
            1                 |
               \              |
                \             |
                 \            |
                    main query

As a result, when the main query joins the tbl (as seen in the snapshot) with the newval table, it joins an empty table with a 1-row table. Obviously it updates 0 rows. So the statement never really came to modify the newly inserted row and that's what you see.

The solution in your case, is to either rewrite the statement to insert the correct values in the first place or use 2 statements. One that inserts and a second to update.

There are other, similar situations, like if the statement had an INSERT and then a DELETE on the same rows. The delete would fail for exactly the same reasons.

Some other cases, with update-update and update-delete and their behaviour are explained in a following paragraph, in the same docs page.

Trying to update the same row twice in a single statement is not supported. Only one of the modifications takes place, but it is not easy (and sometimes not possible) to reliably predict which one. This also applies to deleting a row that was already updated in the same statement: only the update is performed. Therefore you should generally avoid trying to modify a single row twice in a single statement. In particular avoid writing WITH sub-statements that could affect the same rows changed by the main statement or a sibling sub-statement. The effects of such a statement will not be predictable.

And in the reply from Marko Tiikkaja:

The update-update and update-delete cases are explicitly not caused by the same underlying implementation detail (as the insert-update and insert-delete cases).
The update-update case doesn't work because it internally looks like the Halloween problem, and Postgres has no way of knowing which tuples would be okay to update twice and which ones could reintroduce the Halloween problem.

So the reason is the same (how modifying CTEs are implemented and how each CTE sees the same snapshot) but the details differ in these 2 cases, as they more complex and the results can be unpredictable in the update-update case.

In the insert-update (as your case) and a similar insert-delete the results are predictable. Only the insert happens as the second operation (update or delete) has no way to see and affect the newly inserted rows.

The suggested solution though is the same for all cases that try to modify the same rows more than once: Don't do it. Either write statements that modify each row once or use separate (2 or more) statements.

Why can't rows inserted in a CTE be updated in the same statement?

Tags:

Postgresql

Cte

Related

Recent Posts