Debugging Linux I/O latency

Now in due time, I have managed to solve this myself, so I can at least follow up on it myself for posterity.

Unfortunately, I lost the original problem in a kernel upgrade, but gained a new one instead, even worse in performance, and just as hard to track down. The techniques I found were the following:

First of all, blktrace/blkparse is a tool that I found quite helpful. It allows the tracing of the progress of individual I/O requests with many helpful details, such as the process that submitted the request. It is helpful to put the output on tmpfs, so that the handling of the storage of the trace doesn't start tracing itself.

That helped only so far, though, so I compiled a kernel with more debugging functionality. In particular, I found ftrace quite helpful, since it allowed me to trace the poorly performing process inside kernel space, to see what it did and where it blocked. Compiling a debug kernel also provides working WCHAN output for ps as well, which can work as an easier way to see what a process is doing inside the kernel, at least for simpler cases.

I was also hoping for LatencyTop to be useful, but I found it quite buggy, and also that it only displayed latency reasons that were too "high-level" to be truly useful, unfortunately.

Also, I found it more helpful than iostat to simply view the contents of /sys/block/$DEVICE/stat at very close intervals, simply like this:

while :; do cat /sys/block/sda/stat; sleep .1; done

See Documentation/iostats.txt in the kernel source tree for the format of the stat file. Viewing it at close intervals allowed me to see the exact timing and size of I/O bursts and such things.

In the end, I found out that the problem I had after the kernel upgrade was caused by stable pages, a feature introduced in Linux 3.0, causing, in my case, Berkeley DB to halt for extended periods when dirtying pages in its mmap'ed region files. While it seems possible to patch this feature out, and also that the problems it causes might be fixed in Linux 3.9, I have solved the worst problem I had for now by patching Berkeley DB to allow me to put its region files in a different directory (in my case /dev/shm), allowing me to avoid the problem altogether.


According to my experience the simplest and most detailed statistic tool you can install to trace mysterious system performance issues is http://freecode.com/projects/sysstat aka. sar

for sure you want to look at iostat command output as well, specially how much is your %iowait should be below 5-10% under normal system load (below 1.0 or so).

look at the ps output if in the STAT column you see D statuses that means those processes are locked and waiting for IO, very likely a hardware issue with the controller or the disk, check S.M.A.R.T stats as well as dmesg and syslog for clues

check sar log and identify peak times if ever this happens and try to match those time with disk intensive cron jobs eg backups over network

you can benchmark your disk performance with bonnie++


Thought I'd mention strace even though this question is now months old. It may help someone with a similar problem who finds this page.

try.

strace "application"

you can also do

strace -e read,write "application"

to just show read/write events.

The application will load as normal (albeit a bit slower to launch) and you can use it as normal in order to trigger the problem. The output will appear in the shell that you used to launch strace.

The good thing about strace is that you can see the most recent function/kernel call at the time the application triggers the slowdown. You may find that if your /home accounts are on NFS then the application is having some difficulty with file I/O over NFS for some reason.