How can I determine if running tar will cause disk to fill up

tar -c data_dir | wc -c without compression

or

tar -cz data_dir | wc -c with gzip compression

or

tar -cj data_dir | wc -c with bzip2 compression

will print the size of the archive that would be created in bytes, without writing to disk. You can then compare that to the amount of free space on your target device.

You can check the size of the data directory itself, in case an incorrect assumption was made about its size, with the following command:

du -h --max-depth=1 data_dir

As already answered, tar adds a header to each record in the archive and also rounds up the size of each record to a multiple of 512 bytes (by default). The end of an archive is marked by at least two consecutive zero-filled records. So it is always the case that you will have an uncompressed tar file larger than the files themselves, the number of files and how they align to 512 byte boundaries determines the extra space used.

Of course, filesystems themselves use block sizes that maybe bigger than an individual file's contents so be careful where you untar it, the filesystem may not be able to hold lots of small files even though it has free space greater than the tar size!

https://en.wikipedia.org/wiki/Tar_(computing)#Format_details


The size of your tar file will be 937MB plus the size of the metadata needed for each file or directory (512 bytes per object), and padding added to align files to a 512-byte boundary.

A very rough calculation tells us that another copy of your data will leave you with 3.4GB free. In 3.4GB we have room for about 7 million metadata records, assuming no padding, or fewer if you assume an average of 256 bytes' padding per file. So if you have millions of files and directories to tar, you might run into problems.

You could mitigate the problem by

  • compressing on the fly by using the z or j options to tar
  • doing the tar as a normal user so that the reserved space on the / partition won't be touched if you run out of space.

tar itself can report on the size of its archives with the --test option:

tar -cf - ./* | tar --totals -tvf -

The above command writes nothing to disk and has the added benefit of listing the individual filesizes of each file contained in the tarball. Adding the various z/j/xz operands to either side of the |pipe will handle compression as you will.

OUTPUT:

...
-rwxr-xr-x mikeserv/mikeserv         8 2014-03-13 20:58 ./somefile.sh
-rwxr-xr-x mikeserv/mikeserv        62 2014-03-13 20:53 ./somefile.txt
-rw-r--r-- mikeserv/mikeserv       574 2014-02-19 16:57 ./squash.sh
-rwxr-xr-x mikeserv/mikeserv        35 2014-01-28 17:25 ./ssh.shortcut
-rw-r--r-- mikeserv/mikeserv        51 2014-01-04 08:43 ./tab1.link
-rw-r--r-- mikeserv/mikeserv         0 2014-03-16 05:40 ./tee
-rw-r--r-- mikeserv/mikeserv         0 2014-04-08 10:00 ./typescript
-rw-r--r-- mikeserv/mikeserv       159 2014-02-26 18:32 ./vlc_out.sh
Total bytes read: 4300943360 (4.1GiB, 475MiB/s)

Not entirely sure of your purpose, but if it is to download the tarball, this might be more to the point:

ssh you@host 'tar -cf - ./* | cat' | cat >./path/to/saved/local/tarball.tar

Or to simply copy with tar:

ssh you@host 'tar -cf - ./* | cat' | tar -C/path/to/download/tree/destination -vxf -

Tags:

Tar

Disk Usage