Are there performance issues storing files in PostgreSQL?

The BLOB (LO) type stores data in 2KB chunks within standard PostgreSQL heap pages which default to 8KB in size. They are not stored as independent, cohesive files in the file system - for example, you wouldn't be able to locate the file, do a byte-by-byte comparison and expect it to be the same as the original file data that you loaded into the database, since there's also Postgres heap page headers and structures which delineate the chunks.

You should avoid using the Large Object (LO) interface if your application would need to frequently update the binary data, and particularly if that involved a lot of small, random-access writes, which due to the way PostgreSQL implements concurrency control (MVCC) can lead to an explosion in the amount of disk space used until you VACUUM the database. The same outcome is probably also applicable to data stored inline in a column with the bytea type or even TOAST'd.

However, if your data follows a Write-Once-Read-Many pattern (e.g. upload a PNG image and never modify it afterwards), it should be fine from the standpoint of disk usage.

See this pgsql-general mailing list thread for further discussion.


You have basically two choices. You can store the data right in the row or you can use the large object facility. Since PostgreSQL now uses something called TOAST to move large fields out of the table there should be no performance penalty associated with storing large data in the row directly. There remains a 1 GB limit in the size of a field. If this is too limited or if you want a streaming API, you can use the large object facility, which gives you something more like file descriptors in the database. You store the LO ID in your column and can read and write from that ID.

I personally would suggest you avoid the large object facility unless you absolutely need it. With TOAST, most use cases are covered by just using the database the way you'd expect. With large objects, you give yourself additional maintenance burden, because you have to keep track of the LO IDs you've used and be sure to unlink them when they're not used anymore (but not before) or they'll sit in your data directory taking up space forever. There are also a lot of facilities that have exceptional behavior around them, the details of which escape me because I never use them.

For most people, the big performance penalty associated with storing large data in the database is that your ORM software will pull out the big data on every query unless you specifically instruct it not to. You should take care to tell Hibernate or whatever you're using to treat these columns as large and only fetch them when they're specifically requested.