OOM despite available memory (cache)

Solution 1:

  1. For the love of everything good in the world, configure swap space on your servers.
    You really need swap space. I'm not the only one who says so, it's pretty much a universal truth around here. (<-- Those are three links )
    You should of course have enough RAM that your database server isn't swapping regularly -- if you don't the solution is money (which you take you your vendor and use to acquire more RAM).

  2. Since you now have adequate RAM, and swap to use if something goes wrong, you can disable the OOM killer (by disabling memory overcommit), like the Postgres people tell you to.
    (You can also apply their alternate solution and tell the OOM-Killer to never kill Postgres - but then you're just playing Russian Roulette with the rest of your system's processes...)

  3. (optional) Write an answer on Server Fault detailing why the default behavior in most Linux distributions is Bad, Wrong, and violates the POSIX specification for how malloc() is supposed to behave. Repeat it until everyone is sick of hearing about it.

Also note that the kernel's cached memory is available to postgres (or any other application) to use - you should factor it as free/available memory in your calculations.

If I had to hazard a guess at what's happening here I'd say you've got a complex query, Postgres is requesting RAM to execute it, and rather than saying "I don't have that RAM" Linux tells Postgres "Sure, you can have it."
Then when Postgres actually tries to use the RAM it was (allegedly) given Linux realizes it doesn't HAVE the RAM it promised Postgres (because it's overcommitted) - the OOM killer is told to free up the RAM, and dutifully kills the program using the most memory -- your database server.

Postgres is a well-designed program. If it's told it can't have the RAM it's requesting it will handle that gracefully (either by making do with less, or aborting with a message to the user).

Solution 2:

It appears you (and I in a case with very similar symptoms) have truly run out of memory and have been confused by the cached number.

There apparently are cases when Linux not freeing large disk cache when memory demand goes up

In particular (I don't really understand why), postgres' shared_buffers may be reported under "Cached" (the page cache). In your case the 6481852k cached in top matches this line in the OOM-killer's log:

Jun 10 05:45:25 db kernel: [11209156.840788] 1615243 total pagecache pages

(1615243*4KB ~= 6481852k) - meaning the page cache indeed was not dropped before invoking OOM-killer.

Yet there are few file-backed pages (I assume active_file:98 inactive_file:168 is similar to /proc/meminfo's Active(file)/Inactive(file)), so it's not the discardable pages we know and love.

The post at https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/ demonstrates an example session where shutting down postgres leads to reduction of "cached" by the size of shared_buffers (scroll to "And most of it came off disk cache – as expected, because it was used for shared_buffers.") - unfortunately it doesn't indicate the version of postgres nor the kernel that was used for the experiment.

I'm using 3.13.0-67 x86_64 with PG 9.3. In 9.3 they switched from using Sys V shared memory (shmget) to anonymous mmap(...R+W, MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE...)+fork() (in 9.4 this became configurable via dynamic_shared_memory_type). But I couldn't find any explanations as to why these mmap()s are supposed to show up in "cached" and why, only https://access.redhat.com/solutions/406773 that says "Cached: Memory in the pagecache (Diskcache and Shared Memory)"

Given that there's many kinds of shared memory I'm both enlightened and confused...