How do I tell what process is causing kswapd to be in use?
kswapd is managing swap space in response to memory demands greater than physically available for all processes.
It is process agnostic, it is only interested in what pages are access and when (it is more complex than this of course but to keep things simple we may as well view it this way).
So the real question is "what processes have the greatest burden on memory that are causing kswapd to need to page all the time".
That is most easily answered using 'top' and switching to memory usage sort mode.
You can script it.. but you can also do it via top
Run top then press O followed by p then enter
Now all the processes are sorted by swap usage and you can see which ones are using it
If you're on Ubuntu 15.10 or greater, this may actually be the result of a bug, especially if your system is a virtual machine lacking a swap partition (e.g., AWS EC2). The problem exists on other distributions, but, as of writing, it's unclear if the same fix works universally.
A temporary workaround:
sudo ln -s /dev/null /etc/udev/rules.d/40-vm-hotadd.rules sudo reboot
Note that this will disable hotadding RAM/CPUs for Xen and Hyper-V virtual machines.
There also seems to be a bug in
kswapd somewhere, hopefully only on older kernels.
Nearly each day now kswapd goes beserk randomly on some machines in a bigger cluster (with a non-current kernel, though). 100% CPU on both kswapd processes. No other running processes (except ssh shell), plenty of free RAM (more than 700 MB) and no SWAP used at all. No swapin, no swapout as well.
Nothing explains yet, why a particular machine is hit and another is not. It seems not to be completely random, because it usually hits more than one machine within a short time span. It looks like machines, which are idle, as well as machines, which are under high pressure, are less(!) likely hit by the effect. So it has to do something with the work load and only hits if the machine is neither idle nor very busy.
If the problem strikes nothing helps anymore. Killing all processes (which did not become unkillable), unmounting all filesystems, nothing.
kswapd still stays at 100% CPU. I suspect some spinlock race in SMP kernels, but it's also likely that I am wrong.
Perhaps see my answer serverfault.com/questions/316995/#493257
- Rebooting affected machines often fails because the shutdown process starts hanging somewhere.
- There is no direct connection to the Internet. Foreign causes are unlikely.
- It seems to depend on the type of workload the machines processes from a load's perspective, because we have machines which never were affected (yet).
- Sorry, I cannot be more specific on what we do and why.
- Yes, I am speculating. Because it's an extremely puzzling effect, today.