How to get the current free disk space in Postgres?

PostgreSQL does not currently have features to directly expose disk space.

For one thing, which disk? A production PostgreSQL instance often looks like this:

  • /pg/pg94/: a RAID6 of fast reliable storage on a BBU RAID controller in WB mode, for the catalogs and most important data
  • /pg/pg94/pg_xlog: a fast reliable RAID1, for the transaction logs
  • /pg/tablespace-lowredundancy: A RAID10 of fast cheap storage for things like indexes and UNLOGGED tables that you don't care about losing so you can use lower-redundancy storage
  • /pg/tablespace-bulkdata: A RAID6 or similar of slow near-line magnetic storage used for old audit logs, historical data, write-mostly data, and other things that can be slower to access.
  • The postgreSQL logs are usually somewhere else again, but if this fills up, the system may still stop. Where depends on a number of configuration settings, some of which you can't see from PostgreSQL at all, like syslog options.

Then there's the fact that "free" space doesn't necessarily mean PostgreSQL can use it (think: disk quotas, system-reserved disk space), and the fact that free blocks/bytes isn't the only constraint, as many file systems also have limits on number of files (inodes).

How does aSELECT pg_get_free_disk_space() report this?

Knowing the free disk space could be a security concern. If supported, it's something that'd only be exposed to the superuser, at least.

What you can do is use an untrusted procedural language like plpythonu to make operating system calls to interrogate the host OS for disk space information, using queries against pg_catalog.pg_tablespace and using the data_directory setting from pg_settings to discover where PostgreSQL is keeping stuff on the host OS. You also have to check for mount points (unix/Mac) / junction points (Windows) to discover if pg_xlog, etc, are on separate storage. This still won't really help you with space for logs, though.

I'd quite like to have a SELECT * FROM pg_get_free_diskspace that reported the main datadir space, and any mount points or junction points within it like for pg_xlog or pg_clog, and also reported each tablespace and any mount points within it. It'd be a set-returning function. Someone who cares enough would have to bother to implement it for all target platforms though, and right now, nobody wants it enough to do the work.


In the mean time, if you're willing to simplify your needs to:

  • One file system
  • Target OS is UNIX/POSIX-compatible like Linux
  • There's no quota system enabled
  • There's no root-reserved block percentage
  • inode exhaustion is not a concern

then you can CREATE LANGUAGE plpython3u; and CREATE FUNCTION a LANGUAGE plpython3u function that does something like:

import os
st = os.statvfs(datadir_path)
return st.f_bavail * st.f_frsize

in a function that returns bigint and either takes datadir_path as an argument, or discovers it by doing an SPI query like SELECT setting FROM pg_settings WHERE name = 'data_directory' from within PL/Python.

If you want to support Windows too, see Cross-platform space remaining on volume using python . I'd use Windows Management Interface (WMI) queries rather than using ctypes to call the Windows API though.

Or you could use this function someone wrote in PL/Perlu to do it using df and mount command output parsing, which will probably only work on Linux, but hey, it's prewritten.


Here has a simple way to get free disk space without any extended language, just define a function using pgsql.

CREATE OR REPLACE FUNCTION sys_df() RETURNS SETOF text[]
LANGUAGE plpgsql $$
BEGIN
    CREATE TEMP TABLE IF NOT EXISTS tmp_sys_df (content text) ON COMMIT DROP;
    COPY tmp_sys_df FROM PROGRAM 'df | tail -n +2';
    RETURN QUERY SELECT regexp_split_to_array(content, '\s+') FROM tmp_sys_df;
END;
$$;

Usage:

select * from sys_df();
                          sys_df                               
-------------------------------------------------------------------
 {overlay,15148428,6660248,7695656,46%,/}
 {overlay,15148428,6660248,7695656,46%,/}
 {tmpfs,65536,0,65536,0%,/dev}
 {tmpfs,768284,0,768284,0%,/sys/fs/cgroup}
 {/dev/sda2,15148428,6660248,7695656,46%,/etc/resolv.conf}
 {/dev/sda2,15148428,6660248,7695656,46%,/etc/hostname}
 {/dev/sda2,15148428,6660248,7695656,46%,/etc/hosts}
 {shm,65536,8,65528,0%,/dev/shm}
 {/dev/sda2,15148428,6660248,7695656,46%,/var/lib/postgresql/data}
 {tmpfs,65536,0,65536,0%,/proc/kcore}
 {tmpfs,65536,0,65536,0%,/proc/timer_list}
 {tmpfs,65536,0,65536,0%,/proc/sched_debug}
 {tmpfs,768284,0,768284,0%,/sys/firmware}
(13 rows)

Using df $PGDATA | tail -n +2 instead of df | tail -n +2 while you saving all data in same path on disk. In this case, the function only return one row disk usage for $PGDATA path.

NOTE FOR SECURITY

PROGRAM can run any command by shell, it like two-edged sword. it is best to use a fixed command string, or at least avoid passing any user input in it. See detail on document.


Here's a plpython2u implementation we've been using for a while.

-- NOTE this function is a security definer, so it carries the superuser permissions
-- even when called by the plebs.
-- (required so we can access the data_directory setting.)
CREATE OR REPLACE FUNCTION get_tablespace_disk_usage()
    RETURNS TABLE (
        path VARCHAR,
        bytes_free BIGINT,
        total_bytes BIGINT
    )
AS $$
import os

data_directory = plpy.execute("select setting from pg_settings where name='data_directory';")[0]['setting']
records = []

for t in plpy.execute("select spcname, spcacl, pg_tablespace_location(oid) as path from pg_tablespace"):
    if t['spcacl']:
        # TODO handle ACLs. For now only show public tablespaces.
        continue

    name = t['spcname']
    if name == 'pg_default':
        path = os.path.join(data_directory, 'default')
    elif name == 'pg_global':
        path = os.path.join(data_directory, 'global')
    else:
        path = t['path']

    # not all tablespaces actually seem to exist(?) in particular, pg_default.
    if os.path.exists(path):
        s = os.statvfs(path)
        total_bytes = s.f_blocks * s.f_frsize
        bytes_free = s.f_bavail * s.f_frsize

        records.append((path, bytes_free, total_bytes))

return records

$$ LANGUAGE plpython2u STABLE SECURITY DEFINER;

Usage is something like:

SELECT path, bytes_free, total_bytes FROM get_tablespace_disk_usage();