Why does tar appear to skip file contents when output file is /dev/null?

It is a documented optimization:

When the archive is being created to /dev/null, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.

This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.

As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.

In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).

However, there are at least two newer ways to achieve the same:

  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op

  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.

Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.