PostgreSQL: Force data into memory

Postgres 9.4 finally added an extension to preload data from relations into the OS or database buffer cache (at your choice):

pg_prewarm

This allows reaching full operating performance more quickly.

Run once in your database (detailed instructions here):

CREATE EXTENSION pg_prewarm;

Then it's simple to preload any given relation. Basic example:

SELECT pg_prewarm('my_tbl');

Finds the first table named my_tbl in the search path and loads it to the Postgres buffer cache.

Or:

SELECT pg_prewarm('my_schema.my_tbl', 'prefetch');

prefetch issues asynchronous prefetch requests to the operating system, if this is supported, or throws an error otherwise. read reads the requested range of blocks; unlike prefetch, this is synchronous and supported on all platforms and builds, but may be slower. buffer reads the requested range of blocks into the database buffer cache.

The default is buffer, which has the greatest impact (higher cost, best effect).

Read the manual for more details.
Depesz blogged about it, too.


You may be interessted in one of the mailing lists topics, it's answerd by Tom Lane (core dev):

[..] But my opinion is that people who think they are smarter than an LRU caching algorithm are typically mistaken. If the table is all that heavily used, it will stay in memory just fine. If it's not sufficiently heavily used to stay in memory according to an LRU algorithm, maybe the memory space really should be spent on something else. [..]

You might also be interessted in an SO question: https://stackoverflow.com/questions/486154/postgresql-temporary-tables and maybe more suiteable https://stackoverflow.com/questions/407006/need-to-load-the-whole-postgresql-database-into-the-ram


In the general case if you have enough RAM you can generally trust the database service to do a good job of keeping the things you regularly use in RAM. Some systems allow you to hint that the table should always be held in RAM (which is useful for smallish tables that are not used often but when they are used it is important that they respond as quickly as possible) but if pgsql has such table hints you need to be very careful about using them as you are reducing the amount of memory available for caching anything else so you might slow down your application overall.

If you are looking to prime the database's page cache on startup (for instance after a reboot or other maintainence operation that causes the DB to forget everything that is cached) then write a script that does the following:

SELECT * FROM <table>
SELECT <primary key fields> FROM <table> ORDER BY <primary key fields>
SELECT <indexed fields> FROM <table> ORDER BY <indexed fields>

(that last step repeated for each index, or course, and be careful to have the fields in the ORDER BY clause in the right order)

After running the above every data and index page should have been read and so will be in the RAM page cache (for the time being at least). We have scripts like this for our application databases, which are run after reboot so that the first users logging into the system afterwards don't experience slower responsiveness. You are better off hand-writing any such script, instead of scanning the db definition tables (like sys.objects/sys.indexes/sys.columns in MSSQL), then you can selectively scan the indexes that are most commonly used rather than scanning everything which will take longer.