Piping commands with very large output

When the data producer (tar) tries to write to the pipe too quickly for the consumer (lzip) to have time to read all of it, it will block until lzip has had time to read what tar is writing. There is a small buffer associated with the pipe, but its size is likely to be smaller than the size of most tar archives. There is no risk of filling up your system's RAM with your pipeline.

"Blocking" simply means that when tar does a call to the write() library function (or equivalent), the call won't return until the data has been delivered to the pipe buffer, which could take a bit of time if lzip is slow to read from that same buffer. You should be able to see this in top where tar would slow down and sleep a lot compared to lzip (assuming tar is in fact quicker than lzip).

You would therefore not fill up a significant amount of RAM with your pipeline. To do that (if you wanted to), you could use something like pv in the middle, with some large buffer (here, a gigabyte):

tar -cvf - /tmp/source-dir | pv --buffer-size 1G | lzip -o /media/my-usb/result.lz -

This would still block tar whenever pv blocks. pv would block when its buffer is full and it can't write to lzip.


The reverse situation works in a similar way, i.e. if you have a slow left-hand side of a pipe writing to a fast right-hand side, the consumer on the right would block on read() until there is data to be read from the pipe.

This (data I/O) is the only thing that synchronises the processes taking part in a pipeline. Apart from reading and writing (and occasionally blocking while waiting for someone else to read or write), they would run independently of each other.


GNU tar has --lzip option to "filter the archive through lzip", so you may want to use instead:

tar -cvf --lzip /media/my-usb/result.lz /tmp/source-dir

Answering the question: in your case the system will manage the pipe properly using default system buffer size.