PostgreSQL - insert rows based on select from another table, and update an FK in that table with the newly inserted rows

There are several ways to solve the problem.

1. temporarily add a column

As others mentioned, the straight-forward way is to temporarily add a column reminder_id to the dateset. Populate it with original IDs from reminder table. Use it to join reminder with the dateset table. Drop the temporary column.

2. when start is unique

If values of the start column is unique it is possible to do it without extra column by joining reminder table with the dateset table on the start column.

INSERT INTO dateset (start)
SELECT start FROM reminder;

WITH
CTE_Joined
AS
(
    SELECT
        reminder.id AS reminder_id
        ,reminder.dateset_id AS old_dateset_id
        ,dateset.id AS new_dateset_id
    FROM
        reminder
        INNER JOIN dateset ON dateset.start = reminder.start
)
UPDATE CTE_Joined
SET old_dateset_id = new_dateset_id
;

3. when start is not unique

It is possible to do it without temporary column even in this case. The main idea is the following. Let's have a look at this example:

We have two rows in reminder with the same start value and IDs 3 and 7:

reminder
id    start         dateset_id
3     2015-01-01    NULL
7     2015-01-01    NULL

After we insert them into the dateset, there will be new IDs generated, for example, 1 and 2:

dateset
id    start
1     2015-01-01
2     2015-01-01

It doesn't really matter how we link these two rows. The end result could be

reminder
id    start         dateset_id
3     2015-01-01    1
7     2015-01-01    2

or

reminder
id    start         dateset_id
3     2015-01-01    2
7     2015-01-01    1

Both of these variants are correct. Which brings us to the following solution.

Simply insert all rows first.

INSERT INTO dateset (start)
SELECT start FROM reminder;

Match/join two tables on start column knowing that it is not unique. "Make it" unique by adding ROW_NUMBER and joining by two columns. It is possible to make the query shorter, but I spelled out each step explicitly:

WITH
CTE_reminder_rn
AS
(
    SELECT
        id
        ,start
        ,dateset_id
        ,ROW_NUMBER() OVER (PARTITION BY start ORDER BY id) AS rn
    FROM reminder
)
,CTE_dateset_rn
AS
(
    SELECT
        id
        ,start
        ,ROW_NUMBER() OVER (PARTITION BY start ORDER BY id) AS rn
    FROM dateset
)
,CTE_Joined
AS
(
    SELECT
        CTE_reminder_rn.id AS reminder_id
        ,CTE_reminder_rn.dateset_id AS old_dateset_id
        ,CTE_dateset_rn.id AS new_dateset_id
    FROM
        CTE_reminder_rn
        INNER JOIN CTE_dateset_rn ON 
            CTE_dateset_rn.start = CTE_reminder_rn.start AND
            CTE_dateset_rn.rn = CTE_reminder_rn.rn
)
UPDATE CTE_Joined
SET old_dateset_id = new_dateset_id
;

I hope it is clear from the code what it does, especially when you compare it to the simpler version without ROW_NUMBER. Obviously, the complex solution will work even if start is unique, but it is not as efficient, as a simple solution.

This solution assumes that dateset is empty before this process.


Update based on changes in Postgres:

Using INSERT RETURNING in subqueries is, according to the documentation, supported, for Postgres versions 9.1 and after. The hypothetical DML subquery in the original answer should work for Postgres >= 9.1:

UPDATE reminder SET dateset_id = (
    INSERT INTO dateset (start)
    VALUES (reminder.start)
    RETURNING dateset.id));

Original answer:

Here's another way of doing it, distinct from the 3 ways Vladimir suggested so far.

A temporary function will let you read the id of the new rows created as well as other values in the query:

--minimal demonstration schema
CREATE TABLE dateset (
  id SERIAL PRIMARY KEY,
  start TIMESTAMP
  -- other things here...
);

CREATE TABLE reminder (
  id SERIAL PRIMARY KEY,
  start TIMESTAMP,
  dateset_id INTEGER REFERENCES dateset(id)
  -- other things here...
);

--pre-migration data
INSERT INTO reminder (start) VALUES ('2014-02-14'), ('2014-09-06'), ('1984-01-01'), ('2014-02-14');

--all at once
BEGIN;

CREATE FUNCTION insertreturning(ts TIMESTAMP) RETURNS INTEGER AS $$
    INSERT INTO dateset (start)
    VALUES (ts)
    RETURNING dateset.id;
  $$ LANGUAGE SQL;

UPDATE reminder SET dateset_id = insertreturning(reminder.start);

DROP FUNCTION insertreturning(TIMESTAMP);

ALTER TABLE reminder DROP COLUMN start;

END;

This approach to the problem suggested itself after I realized that writing INSERT ... RETURNING as a subquery would solve the issue; although INSERTs are not allowed as subqueries, calls to functions certainly are.

Intriguingly, this suggests that DML subqueries that return values might be broadly useful. If they were possible, we would just write:

UPDATE reminder SET dateset_id = (
    INSERT INTO dateset (start)
    VALUES (reminder.start)
    RETURNING dateset.id));

You can only return columns using RETURNING from the INSERT-part, not from the selected table. So, if you are willing to add a column reminder_id to your dateset-table,

    ALTER TABLE dateset ADD COLUMN reminder_id integer;

the following statement will work:

WITH inserted_datesets AS (
  INSERT INTO dateset (start, reminder_id)
  SELECT start, id FROM reminder
  RETURNING reminder_id, id AS dateset_id
)
UPDATE reminder
SET dateset_id = ids.dateset_id
FROM inserted_datesets AS ids
WHERE id = reminder_id

Only if the values of the column start in reminders are all unique, the following 2 statements will work as well:

INSERT INTO dateset(start) SELECT start FROM reminder;
UPDATE reminder SET dateset_id = (SELECT id FROM dateset WHERE start=reminder.start);

Tags:

Sql

Postgresql