Database "frozen" on ALTER TABLE

The command you wish to run does take an ACCESS EXCLUSIVE lock on the table, preventing all other access to that table. But the duration of this lock should be just a few milliseconds, as adding a column like the one you want to add does not require the table to be re-written, it just requires metadata to be updated.

Where the problem can come in, and I bet you dollars to donuts that it is the problem you are seeing, is in lock priorities. Someone has a weak lock, like ACCESS SHARE lock, on that table, and they are camping on it indefinitely (maybe an idle-in-transaction connection which has been leaked? Someone who opened psql, started a query in a repeatable read mode, and then went on vacation?).

The ADD COLUMN tries to take the ACCESS EXCLUSIVE it needs, and it queues up behind the first lock.

Now all future lock requests queue up behind the waiting ACCESS EXCLUSIVE request.

Conceptually, incoming lock requests which are compatible with the already-granted lock could jump over the waiting ACCESS EXCLUSIVE and be granted out of turn, but that is not how PostgreSQL does it.

You need to find the process which is holding the long-lived weak lock.

You can do this by querying the pg_locks table.

select * from pg_locks where 
    granted and relation = 'cliente'::regclass \x\g\x

If you do this while everything is locked up, you should get only one answer (unless there are multiple long-lived culprits). If you do this after you already killed the ADD COLUMN, then you might see lots of granted locks, but if you repeat it a few times there should one or a few which are staying around each time.

You can then take the PID that you got from pg_lock, and query with that into pg_stat_activity to see what the offender is doing:

select * from pg_stat_activity where pid=28731 \x\g\x

...

backend_start    | 2016-03-22 13:08:30.849405-07
xact_start       | 2016-03-22 13:08:36.797703-07
query_start      | 2016-03-22 13:08:36.799021-07
state_change     | 2016-03-22 13:08:36.824369-07
waiting          | f
state            | idle in transaction
backend_xid      |
backend_xmin     |
query            | select * from cliente limit 4;

So, it ran a query, inside a transaction, and then went idle without ever closing the transaction. It is now 13:13, so they have been idle for 5 minutes.


DDL operations usually lock the object they are acting upon, so should not be performed outside planned maintenance windows (when your users are expecting disruption or the system to be completely offline for up to a planned amount of time) - there is nothing you can do about this easily1.

Some operations only keep a write lock, so your application can keep serving requests that only read the affected objects.

The documentation seems pretty good at listing what locks are likely to be held by DDL operations.

This blog entry has a summary which suggests adding a column can be an online operation if the column is nullable and does not have a default value or unique constraint, though that implies that the statement you state should have been run without locks (as IIRC postgres defaults columns to being NULLable unless you explicitly state otherwise). Did you run any other operations after the add column? Perhaps creating an index upon it (which would take a write lock on the table by default)?

1 Some replication/clustering/mirroring arrangements would allow you to update a mirror (pausing updates to it during the change and replaying them after), switch over to using that copy as the live one, and so on until each copy is updated, so the downtime is limited to the time is takes to replay the changes made during the DDL operation. Live operations like that are not without risk though, so unless you absolutely can't it is recommended you instead arrange a proper maintenance window to perform and verify structural updates in.