Getting last modification date of a PostgreSQL database table

There is no reliable, authorative record of the last modified time of a table. Using the relfilenode is wrong for a lot of reasons:

  • Writes are initially recorded to the write-head log (WAL), then lazily to the heap (the table files). Once the record is in WAL, Pg doesn't rush to write it to the heap, and it might not even get written until the next system checkpoint;

  • Larger tables have multiple forks, you'd have to check all the forks and pick the newest timestamp;

  • A simple SELECT can generate write activity to the underlying table due to hint-bit setting;

  • autovaccum and other maintenance that doesn't change the user visible data still modifies the relation files;

  • some operations, like vaccum full, will replace the relfilenode. It might not be where you expect if you're trying to look at it concurrently without taking an appropriate lock.

A few options

If you don't need reliability, you can potentially use the information in pg_stat_database and pg_stat_all_tables. These can give you the time of the last stats reset, and activity stats since the last stats reset. It doesn't tell you when the most recent activity was, only that it was since the last stats reset, and there's no information about what happened before that stats reset. So it's limited, but it's already there.

One option for doing it reliably is to use a trigger to update a table containing the last-modified times for each table. Be aware that doing so will serialize all writes to the table, destroying concurrency. It will also add a fair bit of overhead to every transaction. I don't recommend it.

A slightly less awful alternative is to use LISTEN and NOTIFY. Have an external daemon process connect to PostgreSQL and LISTEN for events. Use ON INSERT OR UPDATE OR DELETE triggers to send NOTIFYs when a table changes, with the table oid as the notify payload. These get sent when the transaction commits. Your daemon can accumulate change notifications and lazily write them back to a table in the database. If the system crashes, you lose your record of most recent modifications, but that's ok, you just treat all tables as just-modified if you're starting up after a crash.

To avoid the worst of the concurrency issues you could instead log the change timestamps using a before insert or update or delete or truncate on tablename for each statement execute trigger, generalized to take the relation oid as a parameter. This would insert a (relation_oid, timestamp) pair into a change-logging table. You then have a helper process on a separate connection, or called periodically by your app, aggregate that table for the latest info, merge it into a summary table of most recent changes, and truncate the log table. The only advantage of this over the listen/notify approach is that it doesn't lose information on crash - but it's even less efficient, too.

Another approach might be to write a C extension function that uses (eg) ProcessUtility_hook, ExecutorRun_hook, etc to trap table changes and lazily update stats. I haven't looked to see how practical this would be; take a look at the various _hook options in the sources.

The best way would be to patch the statistics code to record this information and submit a patch to PostgreSQL for inclusion in core. Don't just start by writing code; raise your idea on -hackers once you've thought about it enough to have a well defined way to do it (i.e. start by reading the code, don't just post asking "how do I ..."). It might be nice to add last-updated times to pg_stat_..., but you'd have to convince the community it was worth the overhead or provide a way to make it optionally tracked - and you'd have to write the code to keep the stats and submit a patch, because only somebody who wants this feature is going to bother with that.

How I'd do it

If I had to do this, and didn't have the time to write a patch to do it properly, I'd probably use the listen/notify approach outlined above.

Update for PostgreSQL 9.5 commit timestamps

Update: PostgreSQL 9.5 has commit timestamps. If you have them enabled in postgresql.conf (and did so in the past too), you can check the commit timestamp for the row with the greatest xmin to approximate the last modified time. It's only an approximation because if the most recent rows have been deleted they won't be counted.

Also, commit timestamp records are only kept for a limited time. So if you want to tell when a table that isn't modified much is modified, the answer will effectively be "dunno, a while ago".


PostgreSQL 9.5 let us to track last modified commit.

  1. Check track commit is on or off using the following query

    show track_commit_timestamp;
    
  2. If it return "ON" go to step 3 else modify postgresql.conf

    cd /etc/postgresql/9.5/main/
    vi postgresql.conf
    

    Change

    track_commit_timestamp = off
    

    to

    track_commit_timestamp = on
    
  3. Restart PostgreSQL server

  4. Repeat step 1.

  5. Use the following query to track last commit

    SELECT pg_xact_commit_timestamp(xmin), * FROM  YOUR_TABLE_NAME;
    
    SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
    

Yes, this can be expected behave - data about change are stored to transaction log immediately. Data files can be updated with checkpoint_timeout delay (default is 5 minutes). Postgres doesn't hold permanently any time that you request.

Tags:

Postgresql