cp vs. cat to copy a file

One more issue comes to my mind where cat vs. cp makes a significant difference:

By definition, cat will expand sparse files, filling in the gaps with "real" zero bytes, while cp at least can be told to preserve the holes.

Sparse files are files where sequences of zero bytes have been replaced by metadata to preserve space. You can test by creating one with dd, and duplicate it with the tools of your choice.

  1. Create a sparse file (changing to /tmp beforehand to avoid trouble - see final note):

    15> cd /tmp
    16> dd if=/dev/null of=sparsetest bs=512b seek=5 
    0+0 records in 
    0+0 records out 
    0 bytes (0 B) copied, 5.9256e-05 s, 0.0 kB/s
    
  2. size it - it should not take any space.

    17> du -sh sparsetest
    0       sparsetest
    
  3. copy it with cp and check size

    18> cp sparsetest sparsecp
    19> du -sh sparsecp
    0       sparsecp
    
  4. now copy it with cat and check size

    20> cat sparsetest > sparsecat
    21> du -sh sparsecat
    1.3M    sparsecat
    
  5. try your preferred tools to check on their behaviour

  6. don't forget to clean up.

Final note of caution: Experiments like these have the inherent chance of rising your fame with your local sysadmin if you're doing them on a filesystem that's part of his backup plan, or critical for the well-being of the system. Depending on his choice of tool for backup, he might end up needing more tape media than he ever considered possible to back up that one 0-byte file which gets expanded to terabytes of zeroes.

Other files which cannot be copied with neither cat nor cp would include device-special files, etc. It depends on your implementation of copying tool if it is able to duplicate the device node, or if it would merrily copy its contents instead.


According to Keith's comment, cp preserves some permissions, and cat creates the new file as umask indicates. So $2's permission is not preserved that $4/vmlinuz is pretty clean, while if some strange permission is set on $3, $4/System.map will keep that.


Both have equivalent functionality in those two cases, but cp is purely a file operation. "Take this file and make a copy of it over there".

cat, on the other hand, is intended to dump the contents of a file out to the console. "Take this file and display it on the screen" and then have a ninja attack the screen and redirect the output elsewhere.

cp would generally be more efficient, as there's no redirection going only, merely a direct copying of bytes from location A to location B.

cat would be read bytes -> output to console -> intercept output -> redirect to new file.