Parallelize a Bash FOR Loop

Sample task

task(){
   sleep 0.5; echo "$1";
}

Sequential runs

for thing in a b c d e f g; do 
   task "$thing"
done

Parallel runs

for thing in a b c d e f g; do 
  task "$thing" &
done

Parallel runs in N-process batches

N=4
(
for thing in a b c d e f g; do 
   ((i=i%N)); ((i++==0)) && wait
   task "$thing" & 
done
)

It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.

N processes with a FIFO-based semaphore:

# initialize a semaphore with a given number of tokens
open_sem(){
    mkfifo pipe-$$
    exec 3<>pipe-$$
    rm pipe-$$
    local i=$1
    for((;i>0;i--)); do
        printf %s 000 >&3
    done
}

# run the given command asynchronously and pop/push tokens
run_with_lock(){
    local x
    # this read waits until there is something to read
    read -u 3 -n 3 x && ((0==x)) || exit $x
    (
     ( "$@"; )
    # push the return code of the command to the semaphore
    printf '%.3d' $? >&3
    )&
}

N=4
open_sem $N
for thing in {a..g}; do
    run_with_lock task $thing
done 

Explanation:

We use file descriptor 3 as a semaphore by pushing (=printf) and poping (=read) tokens ('000'). By pushing the return code of the executed tasks, we can abort if something went wrong.


Why don't you just fork (aka. background) them?

foo () {
    local run=$1
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

for run in $runList; do foo "$run" & done

In case that's not clear, the significant part is here:

for run in $runList; do foo "$run" & done
                                   ^

Causing the function to be executed in a forked shell in the background. That's parallel.


for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff

Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat looks a bit prone to conflicts if it runs in parallel...