Which archiving method is better for compressing text files on Linux?

Last update of maximumcompression.com is June-2011 (answer updated in Oct-2015)
Therefore this website does not mention
the current champion text compressor worldwide:

      cmix

Competitions/Benchmarks:

  • enwiki6
    18.2% compression of the 1MB text file enwik6
  • Calgary
    17.6% compression of the 14 files of the Calgary corpus (3GB tar file)
  • Hutter Prize
    15.7% compression of the 100MB text file enwik8
    (but cmix is not the winner because requires too much RAM, more than 20GB)
  • Silesia Open Source Compression Benchmark
    15.7% compression of the 202MB Silesia corpus
  • Large Text Compression Benchmark
    12.4% compression of the 1GB text file enwik9

Details:
Byron Knoll is actively developping cmix as libre software (GPL) since 2013 based on the book Data Compression Explained by Matt Mahoney. Matt Mahoney also maintains some of the above benchmarks and proposes ZPAQ (WP), a command line incremental archiver.


If you prefer a more standard tool (requiring less RAM) I recommend:

      lrzip

lrzip is an evolution of rzip by Con Kolivas.
lrzip stands for two names: Long Range ZIP and Lzma RZIP.
lrzip is often better than xz (another popular compression tool).
Alexander Riccio also recommends lrzip.


My favorite is:

      zpaq

The "archiver expert", Matt Mahoney, has intensively worked on PAQ algorithms for ten years and provide the best compromise between CPU/memory resources and compression level.

However, the last zpaq version is not often packaged/available on recent distro :-(
I always compile it from sources when I have a new machine and I need a very good compressor: https://github.com/zpaq/zpaq

clone https://github.com/zpaq/zpaq
cd zpaq
g++ -O3 -march=native -Dunix zpaq.cpp libzpaq.cpp -pthread -o zpaq

Normally, bz2 has a better compression ratio, combined with better recoverability features.

OTOH, gz is faster.

xz is said to be even better than bz2, but I don't know the timing behaviour.


Maybe you could have a look to those benchmarks, especially the part testing the log files compression.