How to obtain maximum compression with .tar.gz?

Or, you can tell tar to user maximum compression this way:

export GZIP=-9
tar cvzf file.tar.gz /path/to/directory

Additionally, to keep your envvars clutter-free, you can do this:

env GZIP=-9 tar cvzf file.tar.gz /path/to/directory

As you stated- "tar can also compress", implies that - tar does not always compress data by itself. It does so only when used with the z option. That too not by itself, but by passing the tarred data through gzip.

However instead, as noted in this answer, you can pipe the two commands: tar & gzip such that you can explicitly specify compression level for the gzip command to achieve the smallest output size.

tar cvf - /path/to/directory | gzip -9 - > file.tar.gz

Here 9 specifies maximum possible compression level.


Usually neither gzip nor tar can create "the absolute smallest tar.gz". There are many compression utilities that can compress to the gz format. I have written a bash script "gz99" to try gzip, 7z and advdef to get the smallest file. To use this to create the smallest possible file run:

tar c path/to/data | gz99 file.gz

The advdef utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99 utility checks that it hasn't corrupted the file before accepting the output of advdef). To use advdef directly, create file.tar.gz however you feel like. Then run:

advdef -z -4 file.tar.gz

This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.

Since you only recently learnt that tar can compress, and didn't say why you wanted the the smallest ".tar.gz" file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn't as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn't matter to you, and you really want the smallest tar file, try:

 tar cv path/to/data | xz -9 > file.tar.xz

Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:

 tar xvf file.tar.xz

To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:

utility         cpu    format  size(bytes)
gzip -9         0.02s  gz      105,628
advdef -2       0.07s  gz      102,619
7z -mx=9 -tgzip 0.42s  gz      102,297
advdef -3       0.55s  gz      102,290
advdef -4       0.75s  gz      101,956
xz -9           0.03s  xz       91,064
xz -3e          0.15s  xz       90,996

In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.