What prevents stdout/stderr from interleaving?

They do interleave! You only tried short output bursts, which remain unsplit, but in practice it's hard to guarantee that any particular output remains unsplit.

Output buffering

It depends how the programs buffer their output. The stdio library that most programs use when they're writing uses buffers to make output more efficient. Instead of outputting data as soon as the program calls a library function to write to a file, the function stores this data in a buffer, and only actually outputs the data once the buffer has filled up. This means that output is done in batches. More precisely, there are three output modes:

  • Unbuffered: the data is written immediately, without using a buffer. This can be slow if the program writes its output in small pieces, e.g. character by character. This is the default mode for standard error.
  • Fully buffered: the data is only written when the buffer is full. This is the default mode when writing to a pipe or to a regular file, except with stderr.
  • Line-buffered: the data is written after each newline, or when the buffer is full. This is the default mode when writing to a terminal, except with stderr.

Programs can reprogram each file to behave differently, and can explicitly flush the buffer. The buffer is flushed automatically when a program closes the file or exits normally.

If all the programs that are writing to the same pipe either use line-buffered mode, or use unbuffered mode and write each line with a single call to an output function, and if the lines are short enough to write in a single chunk, then the output will be an interleaving of whole lines. But if one of the programs uses fully-buffered mode, or if the lines are too long, then you will see mixed lines.

Here is an example where I interleave the output from two programs. I used GNU coreutils on Linux; different versions of these utilities may behave differently.

  • yes aaaa writes aaaa forever in what is essentially equivalent to line-buffered mode. The yes utility actually writes multiple lines at a time, but each time it emits output, the output is a whole number of lines.
  • echo bbbb; done | grep b writes bbbb forever in fully-buffered mode. It uses a buffer size of 8192, and each line is 5 bytes long. Since 5 does not divide 8192, the boundaries between writes are not at a line boundary in general.

Let's pitch them together.

$ { yes aaaa & while true; do echo bbbb; done | grep b & } | head -n 999999 | grep -e ab -e ba
bbaaaa
bbbbaaaa
baaaa
bbbaaaa
bbaaaa
bbbaaaa
ab
bbbbaaa

As you can see, yes sometimes interrupted grep and vice versa. Only about 0.001% of the lines got interrupted, but it happened. The output is randomized so the number of interruptions will vary, but I saw at least a few interruptions every time. There would be a higher fraction of interrupted lines if the lines were longer, since the likelihood of an interruption increases as the number of lines per buffer decreases.

There are several ways to adjust output buffering. The main ones are:

  • Turn off buffering in programs that use the stdio library without changing its default settings with the program stdbuf -o0 found in GNU coreutils and some other systems such as FreeBSD. You can alternatively switch to line buffering with stdbuf -oL.
  • Switch to line buffering by directing the program's output through a terminal created just for this purpose with unbuffer. Some programs may behave differently in other ways, for example grep uses colors by default if its output is a terminal.
  • Configure the program, for example by passing --line-buffered to GNU grep.

Let's see the snippet above again, this time with line buffering on both sides.

{ stdbuf -oL yes aaaa & while true; do echo bbbb; done | grep --line-buffered b & } | head -n 999999 | grep -e ab -e ba
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb
abbbb

So this time yes never interrupted grep, but grep sometimes interrupted yes. I'll come to why later.

Pipe interleaving

As long as each program outputs one line at a time, and the lines are short enough, the output lines will be neatly separated. But there's a limit to how long the lines can be for this to work. The pipe itself has a transfer buffer. When a program outputs to a pipe, the data is copied from the writer program to the pipe's transfer buffer, and then later from the pipe's transfer buffer to the reader program. (At least conceptually — the kernel may sometimes optimize this to a single copy.)

If there's more data to copy than fits in the pipe's transfer buffer, then the kernel copies one bufferful at a time. If multiple programs are writing to the same pipe, and the first program that the kernel picks wants to write more than one bufferful, then there's no guarantee that the kernel will pick the same program again the second time. For example, if P is the buffer size, foo wants to write 2*P bytes and bar wants to write 3 bytes, then one possible interleaving is P bytes from foo, then 3 bytes from bar, and P bytes from foo.

Coming back to the yes+grep example above, on my system, yes aaaa happens to write as many lines as can fit in a 8192-byte buffer in one go. Since there are 5 bytes to write (4 printable characters and the newline), that means it writes 8190 bytes every time. The pipe buffer size is 4096 bytes. It is therefore possible to get 4096 bytes from yes, then some output from grep, and then the rest of the write from yes (8190 - 4096 = 4094 bytes). 4096 bytes leaves room for 819 lines with aaaa and a lone a. Hence a line with this lone a followed by one write from grep, giving a line with abbbb.

If you want to see the details of what's going on, then getconf PIPE_BUF . will tell you the pipe buffer size on your system, and you can see a complete list of system calls made by each program with

strace -s9999 -f -o line_buffered.strace sh -c '{ stdbuf -oL yes aaaa & while true; do echo bbbb; done | grep --line-buffered b & }' | head -n 999999 | grep -e ab -e ba

How to guarantee clean line interleaving

If the line lengths are smaller than the pipe buffer size, then line buffering guarantees that there won't be any mixed line in the output.

If the line lengths can be larger, there's no way to avoid arbitrary mixing when multiple programs are writing to the same pipe. To ensure separation, you need to make each program write to a different pipe, and use a program to combine the lines. For example GNU Parallel does this by default.