Best way to populate a new column in a large table?

It very much depends on details of your setup and requirements.

Note that since Postgres 11, only adding a column with a volatile DEFAULT still triggers a table rewrite. Unfortunately, this is your case.

If you have sufficient free space on disk - at least 110 % of pg_size_pretty((pg_total_relation_size(tbl)) - and can afford a share lock for some time and an exclusive lock for a very short time, then create a new table including the uuid column using CREATE TABLE AS. Why?

  • What causes large INSERT to slow down and disk usage to explode?

The below code uses a function from the additional uuid-oss module.

  • Lock the table against concurrent changes in SHARE mode (still allowing concurrent reads). Attempts to write to the table will wait and eventually fail. See below.

  • Copy the whole table while populating the new column on the fly - possibly ordering rows favorably while being at it.
    If you are going to reorder rows, be sure to set work_mem high enough to do the sort in RAM or as high as you can afford (just for your session, not globally).

  • Then add constraints, foreign keys, indices, triggers etc. to the new table. When updating large portions of a table it is much faster to create indices from scratch than to add rows iteratively. Related advice in the manual.

  • When the new table is ready, drop the old and rename the new to make it a drop-in replacement. Only this last step acquires an exclusive lock on the old table for the rest of the transaction - which should be very short now.
    It also requires that you delete any object depending on the table type (views, functions using the table type in the signature, ...) and recreate them afterwards.

  • Do it all in one transaction to avoid incomplete states.

BEGIN;
LOCK TABLE tbl IN SHARE MODE;

SET LOCAL work_mem = '???? MB';  -- just for this transaction

CREATE TABLE tbl_new AS 
SELECT uuid_generate_v1() AS tbl_uuid, <list of all columns in order>
FROM   tbl
ORDER  BY ??;  -- optionally order rows favorably while being at it.

ALTER TABLE tbl_new
   ALTER COLUMN tbl_uuid SET NOT NULL
 , ALTER COLUMN tbl_uuid SET DEFAULT uuid_generate_v1()
 , ADD CONSTRAINT tbl_uuid_uni UNIQUE(tbl_uuid);

-- more constraints, indices, triggers?

DROP TABLE tbl;
ALTER TABLE tbl_new RENAME tbl;

-- recreate views etc. if any
COMMIT;

This should be fastest. Any other method of updating in place has to rewrite the whole table as well, just in a more expensive fashion. You would only go that route if you don't have enough free space on disk or cannot afford to lock the whole table or generate errors for concurrent write attempts.

What happens to concurrent writes?

Other transaction (in other sessions) trying to INSERT / UPDATE / DELETE in the same table after your transaction has taken the SHARE lock, will wait until the lock is released or a timeout kicks in, whichever comes first. They will fail either way, since the table they were trying to write to has been deleted from under them.

The new table has a new table OID, but concurrent transaction have already resolved the table name to the OID of the previous table. When the lock is finally released, they try to lock the table themselves before writing to it and find that it's gone. Postgres will answer:

ERROR: could not open relation with OID 123456

Where 123456 is the OID of the old table. You need to catch that exception and retry queries in your app code to avoid it.

If you cannot afford that to happen, you have to keep your original table.

Keeping the existing table, alternative 1

Update in place (possibly running the update on small segments at a time) before you add the NOT NULL constraint. Adding a new column with NULL values and without NOT NULL constraint is cheap.
Since Postgres 9.2 you can also create a CHECK constraint with NOT VALID:

The constraint will still be enforced against subsequent inserts or updates

That allows you to update rows peu à peu - in multiple separate transactions. This avoids keeping row locks for too long and it also allows dead rows to be reused. (You'll have to run VACUUM manually if there is not enough time in between for autovacuum to kick in.) Finally, add the NOT NULL constraint and remove the NOT VALID CHECK constraint:

ALTER TABLE tbl ADD CONSTRAINT tbl_no_null CHECK (tbl_uuid IS NOT NULL) NOT VALID;

-- update rows in multiple batches in separate transactions
-- possibly run VACUUM between transactions

ALTER TABLE tbl ALTER COLUMN tbl_uuid SET NOT NULL;
ALTER TABLE tbl ALTER DROP CONSTRAINT tbl_no_null;

Related answer discussing NOT VALID in more detail:

  • Disable all constraints and table checks while restoring a dump

Keeping the existing table, alternative 2

Prepare the new state in a temporary table, TRUNCATE the original and refill from the temp table. All in one transaction. You still need to take a SHARE lock before preparing the new table to prevent losing concurrent writes.

Details in these related answer on SO:

  • Best way to delete millions of rows by ID
  • Add new column without table lock?

I don't have a "best" answer, but I have a "least bad" answer that might let you get things done reasonably fast.

My table had 2MM rows and the update performance was chugging when I tried to add a secondary timestamp column that defaulted to the first.

ALTER TABLE mytable ADD new_timestamp TIMESTAMP ;
UPDATE mytable SET new_timestamp = old_timestamp ;
ALTER TABLE mytable ALTER new_timestamp SET NOT NULL ;

After it hung for 40 minutes, I tried this on a small batch to get an idea of how long this could take -- the forecast was around 8 hours.

The accepted answer is definitely better -- but this table is heavily used in my database. There are a few dozen tables that FKEY onto it; I wanted to avoid switching FOREIGN KEYS on so many tables. And then there are views.

A bit of searching docs, case-studies and StackOverflow, and I had the "A-Ha!" moment. The drain wasn't on the core UPDATE, but on all the INDEX operations. My table had 12 indexes on it -- a few for unique constraints, a few for speeding up the query planner, and a few for fulltext search.

Every row that was UPDATED wasn't just working on a DELETE/INSERT, but also the overhead of altering each index and checking constraints.

My solution was to drop every index and constraint, update the table, then add all the indexes/constraints back in.

It took about 3 minutes to write a SQL transaction that did the following:

  • BEGIN;
  • dropped indexes/constaints
  • update table
  • re-add indexes/constraints
  • COMMIT;

The script took 7 minutes to run.

The accepted answer is definitely better and more proper... and virtually eliminates the need for downtime. In my case though, it would have taken significantly more "Developer" work to use that solution and we had a 30 minute window of scheduled downtime that it could be accomplished in. Our solution addressed it in 10.