What is the best compression algorithm for small 4 KB files?

If you want to "compress TCP packets", you might consider using a RFC standard technique.

  • RFC1978 PPP Predictor Compression Protocol
  • RFC2394 IP Payload Compression Using DEFLATE
  • RFC2395 IP Payload Compression Using LZS
  • RFC3173 IP Payload Compression Protocol (IPComp)
  • RFC3051 IP Payload Compression Using ITU-T V.44 Packet Method
  • RFC5172 Negotiation for IPv6 Datagram Compression Using IPv6 Control Protocol
  • RFC5112 The Presence-Specific Static Dictionary for Signaling Compression (Sigcomp)
  • RFC3284 The VCDIFF Generic Differencing and Compression Data Format
  • RFC2118 Microsoft Point-To-Point Compression (MPPC) Protocol

There are probably other relevant RFCs I've overlooked.


Choose the algorithm that is the quickest, since you probably care about doing this in real time. Generally for smaller blocks of data, the algorithms compress about the same (give or take a few bytes) mostly because the algorithms need to transmit the dictionary or Huffman trees in addition to the payload.

I highly recommend Deflate (used by zlib and Zip) for a number of reasons. The algorithm is quite fast, well tested, BSD licensed, and is the only compression required to be supported by Zip (as per the infozip Appnote). Aside from the basics, when it determines that the compression is larger than the decompressed size, there's a STORE mode which only adds 5 bytes for every block of data (max block is 64k bytes). Aside from the STORE mode, Deflate supports two different types of Huffman tables (or dictionaries): dynamic and fixed. A dynamic table means the Huffman tree is transmitted as part of the compressed data and is the most flexible (for varying types of nonrandom data). The advantage of a fixed table is that the table is known by all decoders and thus doesn't need to be contained in the compressed stream. The decompression (or Inflate) code is relatively easy. I've written both Java and Javascript versions based directly off of zlib and they perform rather well.

The other compression algorithms mentioned have their merits. I prefer Deflate because of its runtime performance on both the compression step and particularly in decompression step.

A point of clarification: Zip is not a compression type, it is a container. For doing packet compression, I would bypass Zip and just use the deflate/inflate APIs provided by zlib.


Compression of ASCII messages scatter plot

This is a follow-up to Rick's excellent answer which I've upvoted. Unfortunately, I couldn't include an image in a comment.

I ran across this question and decided to try deflate on a sample of 500 ASCII messages that ranged in size from 6 to 340 bytes. Each message is a bit of data generated by an environmental monitoring system that gets transported via an expensive (pay-per-byte) satellite link.

The most fun observation is that the crossover point at which messages are smaller after compression is the same as the Ultimate Question of Life, the Universe, and Everything: 42 bytes.

To try this out on your own data, here's a little bit of node.js to help:

const zlib = require('zlib')
const sprintf = require('sprintf-js').sprintf
const inflate_len = data_packet.length
const deflate_len = zlib.deflateRawSync(data_packet).length
const delta = +((inflate_len - deflate_len)/-inflate_len * 100).toFixed(0)
console.log(`inflated,deflated,delta(%)`)
console.log(sprintf(`%03i,%03i,%3i`, inflate_len, deflate_len, delta))

Tags:

Compression