Combining large amount of files

If you have root permissions on that machine you can temporarily increase the "maximum number of open file descriptors" limit:

ulimit -Hn 10240 # The hard limit
ulimit -Sn 10240 # The soft limit

And then

paste res.* >final.res

After that you can set it back to the original values.


A second solution, if you cannot change the limit:

for f in res.*; do cat final.res | paste - $f >temp; cp temp final.res; done; rm temp

It calls paste for each file once, and at the end there is a huge file with all columns (it takes its minute).

Edit: Useless use of cat... Not!

As mentioned in the comments the usage of cat here (cat final.res | paste - $f >temp) is not useless. The first time the loop runs, the file final.res doesn't already exist. paste would then fail and the file is never filled, nor created. With my solution only cat fails the first time with No such file or directory and paste reads from stdin just an empty file, but it continues. The error can be ignored.


If chaos' answer isn't applicable (because you don't have the required permissions), you can batch up the paste calls as follows:

ls -1 res.* | split -l 1000 -d - lists
for list in lists*; do paste $(cat $list) > merge${list##lists}; done
paste merge* > final.res

This lists the files 1000 at a time in files named lists00, lists01 etc., then pastes the corresponding res. files into files named merge00, merge01 etc., and finally merges all the resulting partially merged files.

As mentioned by chaos you can increase the number of files used at once; the limit is the value given ulimit -n minus however many files you already have open, so you'd say

ls -1 res.* | split -l $(($(ulimit -n)-10)) -d - lists

to use the limit minus ten.

If your version of split doesn't support -d, you can remove it: all it does is tell split to use numeric suffixes. By default the suffixes will be aa, ab etc. instead of 01, 02 etc.

If there are so many files that ls -1 res.* fails ("argument list too long"), you can replace it with find which will avoid that error:

find . -maxdepth 1 -type f -name res.\* | split -l 1000 -d - lists

(As pointed out by don_crissti, -1 shouldn't be necessary when piping ls's output; but I'm leaving it in to handle cases where ls is aliased with -C.)


Try to execute it on this way:

ls res.*|xargs paste >final.res

You can also split the batch in parts and try something like:

paste `echo res.{1..100}` >final.100
paste `echo res.{101..200}` >final.200
...

and at the end combine final files

paste final.* >final.res