How to check disk I/O utilization per process?

Solution 1:

If you are lucky enough to catch the next peak utilization period, you can study per-process I/O stats interactively, using iotop.

Solution 2:

You can use pidstat to print cumulative io statistics per process every 20 seconds with this command:

# pidstat -dl 20

Each row will have follwing columns:

  • PID - process ID
  • kB_rd/s - Number of kilobytes the task has caused to be read from disk per second.
  • kB_wr/s - Number of kilobytes the task has caused, or shall cause to be written to disk per second.
  • kB_ccwr/s - Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty pagecache. In this case, some IO which another task has been accounted for will not be happening.
  • Command - The command name of the task.

Output looks like this:

05:57:12 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:32 PM       202      0.00      2.40      0.00  jbd2/sda1-8
05:57:32 PM      3000      0.00      0.20      0.00  kdeinit4: plasma-desktop [kdeinit]              

05:57:32 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:52 PM       202      0.00      0.80      0.00  jbd2/sda1-8
05:57:52 PM       411      0.00      1.20      0.00  jbd2/sda3-8
05:57:52 PM      2791      0.00     37.80      1.00  kdeinit4: kdeinit4 Running...                   
05:57:52 PM      5156      0.00      0.80      0.00  /usr/lib64/chromium/chromium --password-store=kwallet --enable-threaded-compositing 
05:57:52 PM      8651     98.20      0.00      0.00  bash 

05:57:52 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:12 PM       202      0.00      0.20      0.00  jbd2/sda1-8
05:58:12 PM      3000      0.00      0.80      0.00  kdeinit4: plasma-desktop [kdeinit]              

Solution 3:

Nothing beats ongoing monitoring, you simply cannot get time-sensitive data back after the event...

There are a couple of things you might be able to check to implicate or eliminate however — /proc is your friend.

sort -n -k 10 /proc/diskstats
sort -n -k 11 /proc/diskstats

Fields 10, 11 are accumulated written sectors, and accumulated time (ms) writing. This will show your hot file-system partitions.

cut -d" " -f 1,2,42 /proc/[0-9]*/stat | sort -n -k +3

Those fields are PID, command and cumulative IO-wait ticks. This will show your hot processes, though only if they are still running. (You probably want to ignore your filesystem journalling threads.)

The usefulness of the above depends on uptime, the nature of your long running processes, and how your file systems are used.

Caveats: does not apply to pre-2.6 kernels, check your documentation if unsure.

(Now go and do your future-self a favour, install Munin/Nagios/Cacti/whatever ;-)


Solution 4:

Use atop. (http://www.atoptool.nl/)

Write the data to a compressed file that atop can read later in an interactive style. Take a reading (delta) every 10 seconds. do it 1080 times (3 hours; so if you forget about it the output file won't run you out of disk):

$ atop -a -w historical_everything.atop 10 1080 &

After bad thing happens again:

(even if it is still running in the background, it just appends every 10 seconds)

% atop -r historical_everything.atop

Since you said IO, I would hit 3 keys: tdD

t - move forward to the next data gathering (10 seconds)
d - show the disk io oriented information per process
D - sort the processes based on disk activity
T - go backwards 1 data point (10 seconds probably)
h - bring up help
b - jump to a time (nearest prior datapoint) - e.g. b12:00 - only jumps forward
1 - display per second instead of delta since last datapiont in the upper half of the display

Solution 5:

Use btrace. It's easy to use, for example btrace /dev/sda. If the command is not available, it is probably available in package blktrace.

EDIT: Since the debugfs is not enabled in the kernel, you might try date >>/tmp/wtf && ps -eo "cmd,pid,min_flt,maj_flt" >>/tmp/wtf or similar. Logging page faults is not of course at all the same than using btrace, but if you are lucky, it MAY give you some hint about the most disk hungry processes. I just tried that one on of my most I/O intensive servers and list included the processes I know are consuming lots of I/O.

Tags:

Linux

Io

Storage