Best compression method?

The default is gz. The best results I get with 7z though.

Here is the results for a 1.4 Gb virtualbox container:

enter image description here

Best compression – size in MB:

7z 493
rar 523
bz2 592
lzh 607
gz 614
Z 614
zip 614
.arj 615
lzo 737
zoo 890

Source

enter image description here

Install

 sudo apt-get install p7zip-full

This question is very old, but perhaps somebody finds this solution useful:

Use rzip, after tar. It first compresses 900 MB large data blocks using a dictionary method, and then it hands the cleaned-up data over to bzip2. It is much faster than the other strong compression tools (bzip2, lzma), and some files it compresses even better than bzip2 or lzma.

Yes, gz is the default compression tool on Linux. It is fast, and despite its age it gives still very good results in compressing text files like source code. Another standard tool is bzip2, though it is much slower.

Addition: lrzip is newer and extends the principle of rzip. It even supports unlimited block sizes, and a choice of compression methods (LZMA, Bzip2, Gzip, LZO, ZPAQ or none). LZMA is the standard. For backup or if you share much data with other Linux/BSD users, it can come in really handy.


I opt for a LZMA. It has smallest byte overhead and has strong compression ratio. Comparison between ZIP and LZMA: I've generated two files seq.txt with PHP code

$s = '0123456789'; $str = ''; for ($i=0; $i < 1000000; $i++) $str .= $s[$i%10].($i%10==9 ? "\n":""); file_put_contents('seq.txt', $str);

which holds repeating blocks of 0..9 digits ~ 1Mb of data and rnd.txt with PHP code

$s = '0123456789'; $str = ''; for ($i=0; $i < 1000000; $i++) $str .= $s[rand(0,9)].($i%10==9 ? "\n":""); file_put_contents('rnd.txt', $str);

which holds random blocks of 0..9 digits ~ 1Mb of data.

Compression results:

  • seq.txt, rnd.txt - 1100000 bytes
  • seq.txt.zip - 2502 bytes
  • rnd.txt.zip - 515957 bytes
  • seq.txt.lzma - 257 bytes
  • rnd.txt.lzma - 484939 bytes

Compression ratio:

  • ZIP       -> "seq.txt" -> 99.772%
  • ZIP       -> "rnd.txt" -> 53.094%
  • LZMA  -> "seq.txt" -> 99.976%
  • LZMA  -> "rnd.txt" -> 55.914%

So LZMA has compressed sequential data by 0.2% more effectively than ZIP
and random data 2.8% more effectively than ZIP.

For sure LZMA wins !

Tags:

Compression