Which file compression software for linux offers the highest size reduction?

lrzip is what you're really looking for, especially if you're compressing source code!

Quoting the README:

This is a compression program optimised for large files. The larger the file and the more memory you have, the better the compression advantage this will provide, especially once the files are larger than 100MB. The advantage can be chosen to be either size (much smaller than bzip2) or speed (much faster than bzip2). [...]The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit.

lrzip works by first scanning for and removing any long-distance data redundancy with an rzip-based algorithm, then compressing the non-redundant data.

Con Kolivas provides a fantastic example in the Linux Kernel Mailing List; wherein he compresses a 10.3GB tarball of forty Linux Kernel releases down to 163.9MB (1.6%), and does so faster than xz. He wasn't even using the most aggressive second-pass algorithm!

I'm sure you'll have great results compressing massive tarballs of source code :)

sudo apt-get install lrzip

Example (using default for others options):

Ultra compression, dog slow:

lrzip -z file

For folders, just change lrzip for lrztar


7zip is more a compactor (like PKZIP) than a compressor. It's available for Linux, but it can only create compressed archives in regular files, it's not able to compress a stream for instance. It's not able to store most of Unix file attributes like ownership, ACLs, extended attributes, hard links...

On Linux, as a compressor, you've got xz that uses the same compression algorithm as 7zip (LZMA2). You can use it to compress tar archives.

Like for gzip and bzip2, there's a parallel variant pixz that can leverage several processors to speed up the compression (xz can also do it natively since version 5.2.0 with the -T option). The pixz variant also supports indexing a compressed tar archive which means it's able to extract a single file without having to uncompress the file from the start.


If you're looking for greatest size reduction regardless of compression speed, LZMA is likely your best option.

When comparing the various compressions, generally the tradeoff is time vs. size. gzip tends to compress and decompress relatively quickly while yielding a good compression ratio. bzip2 is somewhat slower than gzip both in compression and decompression time, but yields even greater compression ratios. LZMA has the longest compression time but yields the best ratios while also having a decompression rate outperforming that of bzip2.

Sources: http://bashitout.com/2009/08/30/Linux-Compression-Comparison-GZIP-vs-BZIP2-vs-LZMA-vs-ZIP-vs-Compress.html

http://tukaani.org/lzma/benchmarks.html

Tags:

Compression