How to list files in a folder from inside Postgres?

Extended version of this answer, function ls_files_extended:

-- Unfortunately that variant only allow use hardcoded path
-- To use user parameter we will use dynamic EXECUTE.
-- Return also file size and allow filtering
--
-- @param path text. Filesystem path for read to
-- @param filter text (default null meaning return all). Where condition to filter files. F.e.: $$filename LIKE '0%'$$
-- @param sort text (default filename).
--
-- Examples of use:
-- 1) Simple call, return all files, sort by filename:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive')
-- 2) Return all, sort by filesize:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', null, 'size ASC')
-- 3) Use filtering and sorting:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', 'filename LIKE ''0%''', 'size ASC')
-- or use $-quoting for easy readability:
-- SELECT * FROM ls_files_extended('/pg_xlog.archive', $$filename LIKE '0%'$$, 'size ASC')
CREATE OR REPLACE FUNCTION ls_files_extended(path text, filter text default null, sort text default 'filename')
    RETURNS TABLE(filename text, size bigint) AS
$BODY$
BEGIN
  SET client_min_messages TO WARNING;
  CREATE TEMP TABLE _files(filename text, size bigint) ON COMMIT DROP;

  EXECUTE format($$COPY _files FROM PROGRAM 'find %s -maxdepth 1 -type f -printf "%%f\t%%s\n"'$$, path);

  RETURN QUERY EXECUTE format($$SELECT * FROM _files WHERE %s ORDER BY %s $$, concat_ws(' AND ', 'true', filter), sort);
END;
$BODY$ LANGUAGE plpgsql SECURITY DEFINER;

Using PostgreSQL 9.3, it is possible to avoid the overhead of installing a language extension:

DROP TABLE IF EXISTS files;
CREATE TABLE files(filename text);
COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"'; 
SELECT * FROM files ORDER BY filename ASC;

Creates a table with 2,000+ rows from [ to zip.

Normally the COPY command requires superuser privileges. Since the path to the file system is hard-coded (i.e., not an unsanitized value from users), it doesn't pose a great security risk to define the function first using a superuser account (e.g., postgres) as follows:

CREATE OR REPLACE FUNCTION files()
  RETURNS SETOF text AS
$BODY$
BEGIN
  SET client_min_messages TO WARNING;
  DROP TABLE IF EXISTS files;
  CREATE TEMP TABLE files(filename text);
  COPY files FROM PROGRAM 'find /usr/bin -maxdepth 1 -type f -printf "%f\n"';
  RETURN QUERY SELECT * FROM files ORDER BY filename ASC;
END;
$BODY$
  LANGUAGE plpgsql SECURITY DEFINER;

Log in to PostgreSQL using a non-superuser account, then:

SELECT * FROM files();

The same list of results should be returned without any security violation errors.


The SECURITY DEFINER tells PostgreSQL to run the function under the role of the account that was used to create the function. Since it was created using a superuser role, it will execute with superuser permissions, regardless of the role that executes the command.

The SET client_min_messages TO WARNING; tells PostgreSQL to suppress messages if the table cannot be dropped. It's okay to delete this line.

The CREATE TEMP TABLE is used to create a table that does not need to persist over time. If you need a permanent table, remove the TEMP modifier.

The 'find...' command, which could also be /usr/bin/find, lists only files (type -f) and displays only the filename without the leading path separated one filename per line (-printf "%f\n"). Finally, -maxdepth 1 limits the file search to only the specified directory without searching any subdirectories. See find's man page for details.


One disadvantage to this approach is that there doesn't seem to be a way to parameterize the command to execute. It seems that PostgreSQL requires it to be a text string, rather than an expression statement. Perhaps this is for the best as it prevents allowing arbitrary commands to be executed. What you see is what you execute.

Tags:

Postgresql