Bash split a list of files

Method #1 - Using head & tail

You can use the command head to pull out the first 40 files from a file listing like so:

$ head -40 input_files | xargs ...

To get the next 40:

$ tail -n +41 input_file  | head -40 | xargs ...

...

$ tail -n +161 input_file | head -40 | xargs ...

You can keep walking down the list, 40 at a time using this same technique.

Method #2 - Using xargs

If you happen to have all your filenames in a variable, you can use xargs like so to break up the list into chunks of X number of elements.

Example

Pretend my files are called 1-200. So I load them up into a variable like so:

$ files=$(seq 200)

You can see the first couple of items in this variable:

$ echo $files  | head -c 20
1 2 3 4 5 6 7 8 9 10

Now we use xargs to divide it up:

$ xargs -n 40 <<<$files
1 2 3 4 5 6 7 8 9 10 ...
41 42 43 44 45 46 47 ...
81 82 83 84 85 86 87 ...
121 122 123 124 125 ...
141 142 143 144 145 ...
161 162 163 164 165 ...
181 182 183 184 185 ...

You could then pass the above command to another xargs which would then run your program:

$ xargs -n 40 <<<$files | xargs ...

If the contents of the list of files isn't easily accessible from a variable you can give xargs a list via a file instead:

$ xargs -n 40 <input_file
1 2 3 4 5 6 7 8 9 10 ...
41 42 43 44 45 46 47 ...
81 82 83 84 85 86 87 ...
121 122 123 124 125 ...
141 142 143 144 145 ...
161 162 163 164 165 ...
181 182 183 184 185 ...

Method #3 - Bash arrays

Say you had your filenames in a Bash array. Again I'm using a sequence of number 1-200 to represent my filenames.

$ foo=( $(seq 200) )

You can see the contents of the array like so:

$ echo ${foo[@]}
1 2 3 4 5 ....

Now to get the 1st 40:

$ echo "${foo[@]:0:40}"

The 2nd 40, etc:

$ echo "${foo[@]:40:40}"
...
$ echo "${foo[@]:160:40}"

This is a perfect recipe for xargs:

cat list_of_files | xargs -n 40 command

Quoting from man xargs:

 -n number   Set the maximum number of arguments taken from standard input
             for each invocation of the utility.  An invocation of utility
             will use less than number standard input arguments if the
             number of bytes accumulated (see the -s option) exceeds the
             specified size or there are fewer than number arguments
             remaining for the last invocation of utility.  The current
             default value for number is 5000.

In order to perform different actions for each set, you'd need to get relevant lines before passing those to xargs:

 sed -n '1,40p' list_of_files | xargs command1
 sed -n '41,80p' list_of_files | xargs command2
 ...     

FYI, I LOVE the xargs -n 40 <<<$files but since it does "40 args" per line I did

threads=10
xargs -n $((40/threads)) <<<$files

or if in an array..

n=(1 2 3 4 5 6)
xargs -n $((${#n[@]}/threads))

while read -r input; do
  for item in $input; do
    <..stuff..>
  done &
done <<< $(for x in ${n[@]}; do echo $x; done | xargs -n $((${#n[@]}/threads)))
wait