Bash: limit the number of concurrent jobs?

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel gzip ::: *.log

which will run one gzip per CPU core until all logfiles are gzipped.

If it is part of a larger loop you can use sem instead:

for i in *.log ; do
    echo $i Do more stuff here
    sem -j+0 gzip $i ";" echo done
done
sem --wait

It will do the same, but give you a chance to do more stuff for each file.

If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

It will download, check signature, and do a personal installation if it cannot install globally.

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

A small bash script could help you:

# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
    sleep 1
    joblist=($(jobs -p))
done
$* &

If you call:

. exec-async.sh sleep 10

...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.

You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.

The sleep inside is ugly, but I didn't find a way to wait for the first job that terminates.

The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:

. /path/to/bgx.sh

It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).

It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:

Set up an empty group variable.
Transfer that to bgxgrp.
Call bgxlimit() with the limit and command you want to run.
Transfer the new group back to your group variable.

Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.

#!/bin/bash

# bgxupdate - update active processes in a group.
#   Works by transferring each process to new group
#   if it is still active.
# in:  bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.

bgxupdate() {
    bgxoldgrp=${bgxgrp}
    bgxgrp=""
    ((bgxcount = 0))
    bgxjobs=" $(jobs -pr | tr '\n' ' ')"
    for bgxpid in ${bgxoldgrp} ; do
        echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            bgxgrp="${bgxgrp} ${bgxpid}"
            ((bgxcount++))
        fi
    done
}

# bgxlimit - start a sub-process with a limit.

#   Loops, calling bgxupdate until there is a free
#   slot to run another sub-process. Then runs it
#   an updates the process group.
# in:  $1     - the limit on processes.
# in:  $2+    - the command to run for new process.
# in:  bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes

bgxlimit() {
    bgxmax=$1; shift
    bgxupdate
    while [[ ${bgxcount} -ge ${bgxmax} ]]; do
        sleep 1
        bgxupdate
    done
    if [[ "$1" != "-" ]]; then
        $* &
        bgxgrp="${bgxgrp} $!"
    fi
}

# Test program, create group and run 6 sleeps with
#   limit of 3.

group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
    bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
    echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done

# Wait until all others are finished.

echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
    oldcount=${bgxcount}
    while [[ ${oldcount} -eq ${bgxcount} ]]; do
        sleep 1
        bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
    done
    echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done

Here’s a sample run, with blank lines inserted to clearly delineate different time points:

0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]

4 12:38:10 [ 5880 2524 1560 ]

5 12:38:20 [ 2524 1560 5032 ]

6 12:38:30 [ 1560 5032 5212 ]

9 12:38:50 [ 5032 5212 ]

9 12:39:10 [ 5212 ]

9 12:39:30 [ ]

The whole thing starts at 12:38:00 (time t = 0) and, as you can see, the first three processes run immediately.
Each process sleeps for 10n seconds and the fourth process doesn’t start until the first exits (at time t = 10). You can see that process 3368 has disappeared from the list before 1560 is added.
Similarly, the fifth process 5032 starts when 5880 (the second) exits at time t = 20.
And finally, the sixth process 5212 starts when 2524 (the third) exits at time t = 30.
Then the rundown begins, the fourth process exits at time t = 50 (started at 10 with 40 duration).
The fifth exits at time t = 70 (started at 20 with 50 duration).
Finally, the sixth exits at time t = 90 (started at 30 with 60 duration).

Or, if you prefer it in a more graphical time-line form:

Process:  1  2  3  4  5  6 
--------  -  -  -  -  -  -
12:38:00  ^  ^  ^            1/2/3 start together.
12:38:10  v  |  |  ^         4 starts when 1 done.
12:38:20     v  |  |  ^      5 starts when 2 done.
12:38:30        v  |  |  ^   6 starts when 3 done.
12:38:40           |  |  |
12:38:50           v  |  |   4 ends.
12:39:00              |  |
12:39:10              v  |   5 ends.
12:39:20                 |
12:39:30                 v   6 ends.

Here's the shortest way:

waitforjobs() {
    while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}

Call this function before forking off any new job:

waitforjobs 10
run_another_job &

To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.

Bash: limit the number of concurrent jobs?

Tags:

Shell

Bash

Concurrency

Related

Recent Posts