Estimate compressibility of file

You could try compressing one every 10 blocks for instance to get an idea:

perl -MIPC::Open2 -nE 'BEGIN{$/=\4096;open2(\*I,\*O,"gzip|wc -c")}
                       if ($. % 10 == 1) {print O $_; $l+=length}
                       END{close O; $c = <I>; say $c/$l}'

(here with 4K blocks).


Here's a (hopefully equivalent) Python version of Stephane Chazelas's solution

python -c "
import zlib
from itertools import islice
from functools import partial
import sys
with open(sys.argv[1], "rb") as f:
  compressor = zlib.compressobj()
  t, z = 0, 0.0
  for chunk in islice(iter(partial(f.read, 4096), b''), 0, None, 10):
    t += len(chunk)
    z += len(compressor.compress(chunk))
  z += len(compressor.flush())
  print(z/t)
" file

I had a multi-gigabyte file and I wasn't sure if it was compressed, so I test-compressed the first 10M bytes:

head -c 10000000 large_file.bin | gzip | wc -c

It's not perfect but it worked well for me.