OOM killer not working?

From the official /proc/sys/vm/* documentation:

oom_kill_allocating_task

This enables or disables killing the OOM-triggering task in out-of-memory situations.

If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed.

If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan.

If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task.

The default value is 0.

In order to summarize, when setting oom_kill_allocating_task to 1, instead of scanning your system looking for processes to kill, which is an expensive and slow task, the kernel will just kill the process that caused the system to get out of memory.

From my own experiences, when a OOM is triggered, the kernel has no more "strength" enough left to do such scan, making the system totally unusable.

Also, it would be more obvious just killing the task that caused the problem, so I fail to understand why it is set to 0 by default.

For testing, you can just write to the proper pseudo-file in /proc/sys/vm/, which will be undone on the next reboot:

echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task

For a permanent fix, write the following to /etc/sysctl.conf or to a new file under /etc/sysctl.d/, with a .conf extension (/etc/sysctl.d/local.conf for example):

vm.oom_kill_allocating_task = 1

Update: The bug is fixed.

Teresa's answer is enough to workaround the problem and is good.

Additionally, I've filed a bug report because that is definitely a broken behavior.


You can try earlyoom, an OOM killer that operates in user space and tries to kill the largest process in an OOM situation.

Tags:

Kernel