Understanding top and load average

Load average is usually described as "average length of run queue". So few CPU-consuming processes or threads can raise LA above 1. There is no problem if LA is less than total number of CPU cores. But if it gets higher than number of CPUs, this means some threads/processes will stay in queue, ready to run, but waiting for free CPU.


The numbers that are used to calculate load average are tasks in the run or uninterruptable state and the amount of work done in the time slice of the moving average. These tasks can be part of a multithreaded process. The fields are fuzzy the farther back in time due to smoothing results from the algorithm used.

A load of 1 is equal to 100% of one CPUs worth of work. If you had a multithreaded application that managed to have a number of active threads in excess of the number of available CPUs, then you could have a single process drive the load above 1. This would likely be a short term spike and not reflected in the longer time slice views of the load average.

Also, since the load average was developed before there were multi-core systems, its important to divide the load numbers by the total available number of cores. If this is a sustained load of 9 on a 4 socket quad core system, then this is a 9 of 16 load and not really a problem.


See kernel/sched/loadavg.c which has a long and excellent comment at the start explaining the derivation of load average from a exponentially decaying average of the number of runnable threads (the "run queue") plus the number of uninterruptable threads (waiting on I/O or waiting on a lock).

Here's the essence of the comment, but it is worthwhile reading in full:

 * The global load average is an exponentially decaying average of
 * nr_running + nr_uninterruptible.
 *
 * Once every LOAD_FREQ:
 *     nr_active = 0;
 *     for_each_possible_cpu(cpu)
 *         nr_active += cpu_of(cpu)->nr_running +
 *                      cpu_of(cpu)->nr_uninterruptible;
 *     avenrun[n] = avenrun[0] *
 *                  exp_n + nr_active *
 *                  (1 - exp_n)

Real life makes the code somewhat complex: per-CPU counters, tickless kernels, hotswap CPUs, lack of floaing point code requiring a fixed-point implementation of exp(n). But it's easy to see that these are all working towards faithfully implementing the method described in the comment.

You'll note that Linux counts threads, not just processes, which answers your question.