Why does tar appear to skip file contents when output file is /dev/null?
It is a documented optimization:
When the archive is being created to
/dev/null, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.
This can happen with a variety of programs, for example, I had that behavior once when just using
cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between
read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when
read returns) and
write calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range(not portable to other unixes at all) and
sendfile(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is
/dev/nulland turns the system call into a no-op
Programs can use
mmapto get the file contents instead of
read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can
mmapthe source file, then call
writeon that chunk of mapped memory. However, as writing
/dev/nulldoesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to
/dev/null, but they're the reason why any program, when used to check read-speeds, should be run with
| cat > /dev/null instead of
> /dev/null - and why
| cat > /dev/null should be avoided in all other cases.