Can GNU parallel output stdout before the program has exited?

I think you're looking for --ungroup. The man page says:

--group  Group output. Output from each jobs is grouped 
         together and is only printed when the command is finished. 

         --group is the default. Can be reversed with -u.

-u of course is a synonym for --ungroup.


To watch progress for a few parallel jobs, try --tmuxpane --fg:

parallel --tmuxpane --fg seq {} 10000000 ::: {1..100}

You could also be looking for -u or (more likely) --lb. From man parallel:

   --line-buffer
   --lb
       Buffer output on line basis. --group will keep the output together
       for a whole job. --ungroup allows output to mixup with half a line
       coming from one job and half a line coming from another job.
       --line-buffer fits between these two: GNU parallel will print a full
       line, but will allow for mixing lines of different jobs.

       --line-buffer takes more CPU power than both --group and --ungroup,
       but can be much faster than --group if the CPU is not the limiting
       factor.

       Normally --line-buffer does not buffer on disk, and can thus process
       an infinite amount of data, but it will buffer on disk when combined
       with: --keep-order, --results, --compress, and --files. This will
       make it as slow as --group and will limit output to the available
       disk space.

       With --keep-order --line-buffer will output lines from the first job
       while it is running, then lines from the second job while that is
       running. It will buffer full lines, but jobs will not mix. Compare:

         parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
         parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
         parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4

       See also: --group --ungroup

[...]

   --ungroup
   -u  Ungroup output.  Output is printed as soon as possible and by passes
       GNU parallel internal processing. This may cause output from
       different commands to be mixed thus should only be used if you do not
       care about the output. Compare these:

         seq 4 | parallel -j0 \
           'sleep {};echo -n start{};sleep {};echo {}end'
         seq 4 | parallel -u -j0 \
           'sleep {};echo -n start{};sleep {};echo {}end'

       It also disables --tag. GNU parallel outputs faster with -u. Compare
       the speeds of these:

         parallel seq ::: 300000000 >/dev/null
         parallel -u seq ::: 300000000 >/dev/null
         parallel --line-buffer seq ::: 300000000 >/dev/null

       Can be reversed with --group.

       See also: --line-buffer --group

One example where -u shines is where stdout and stderr is mixed in the same line:

echo -n 'This is stdout (';echo -n stderr >&2 ; echo ')'

This will be formatted wrongly with --lb and --group.

But even -u does not guarantee it will be formatted correctly due to the half-line mixing between processes: http://mywiki.wooledge.org/BashPitfalls#Non-atomic_writes_with_xargs_-P


My solution was to log the output into files and watch it change in real time with tail -f <file> command, and then deleting them automatically when the job is done. I also found --progress flag useful.

parallel --progress ./program {} '>' {}.log';' rm {}.log ::: A B C

Here jobs will consist of running program with different inputs A,B,C and sending the program's output to the corresponding log files.

Tags:

Gnu Parallel