Is there any way to determine how many rows a transaction has modified (before the transaction ends)?

Following @Nick Barnes suggestion to use pg_table_size I was able to obtain a rough estimate of progress. Note however that this can only work when only one transaction is modifying the table. If you have multiple transactions modifying the table you will not be able to tell how much a specific transaction has completed.

Moreover in order for you to be able to get a progress estimate you should know how many rows are affected by the query.

So using the command:

SELECT pg_table_size('my_table');

You can obtain the size of the table (this contains both the commited and uncommitted data).

If you know the initial size of the table initial_size, the initial number of rows N, and the number K of rows affected by the query you can get an estimated of the data that your query will write:

delta_size = (initial_size/N)*K

Now if the current table size is current_size your estimate for progress will be:

progress_perc = 100*(current_size - initial_size)/delta_size

If you do not know the initial size of the table you can estimate it by checking the growth of the table.

Using pg_table_size check how much the table grows in a certain interval of time delta_t, say 1 hour. Then you can estimate the initial size from the equation:

initial_size = current_size - growth*num_delta_t_passed

You can obtain the time since when the transaction started using the query:

SELECT pid, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' 
ORDER BY query_start desc;

(taken from this gist page)

For Postgresql 9.6 it does not seem possible to obtain an accurate response. I don't know whether a better option exists for Postgresql 10+.


With postgres implementation of MVCC, you may query the xmax metafield that will tell you exactly how many rows are currently being updated by the other transaction (assuming, there is no other transaction performing write operations on that table at the same time).

So something like :

select count(1) from my_table where xmax <> 0;

should give you the number of rows updated.


There is at least one way to peek at the progress of an uncommitted UPDATE, though it's a bit heavy-handed.

Postgres handles transaction isolation through row versioning. Their implementation involves tagging every record version with the smallest and largest transaction IDs which are allowed to see it (xmin and xmax, respectively).

Under this scheme, an UPDATE works by setting the xmax of the target record to the current transaction ID (equivalent to a DELETE) and creating an updated copy with the transaction ID in xmin (equivalent to an INSERT).

These system columns can be queried, so given the transaction ID of the UPDATE (which you can get from pg_stat_activity.backend_xid), you can find out how many rows it's processed with e.g.:

SELECT COUNT(*)
FROM my_table
WHERE xmax = 2357

Things get a bit messier if the transaction has set any savepoints, in which case the xmax will be a subtransaction ID, which doesn't appear in pg_stat_activity (or anywhere else, as far as I'm aware). In that case, you can inspect all rows which have been marked for update/deletion, by either in-progress or rolled-back transactions, with:

SELECT xmax, COUNT(*)
FROM my_table
WHERE xmax <> 0
GROUP BY xmax

... and from there, it shouldn't be too hard to figure out which ID is the one you're interested in.