How to run parallel processes and combine outputs when both finished

Use wait. For example:

Data1 ... > Data1Res.csv &
Data2 ... > Data2Res.csv &
wait
AnalysisProg

will:

  • run the Data1 and Data2 pipes as background jobs
  • wait for them both to finish
  • run AnalysisProg.

See, e.g., this question.


cxw's answer is no doubt the preferable solution, if you only have 2 files. If the 2 files are just examples and you in reality have 10000 files, then the '&' solution will not work, as that will overload your server. For that you need a tool like GNU Parallel:

ls Data* | parallel 'cat {} | this | that |theother | grep |sed | awk |whatever > {}res.csv
AnalysisProg -i *res.csv

To learn more about GNU Parallel:

  • Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
  • Walk through the tutorial (man parallel_tutorial). You command line will love you for it.