How to compress a sequence of non-repeated number size N bits?

As is pointed out in comments, the optimal encoding -- if all permutations are equally probable -- is to replace the entire permutation with its index in the enumeration of permutations. Since there are n! possible permutations, the index requires log₂n! bits, and therefore the compression ratio from the naive encoding using log₂n bits for each element is (log n!)/(n log n).

Using Stirling's approximation, we can rewrite that as (n log n - n + O(log n))/(n log n), which is 1 - 1/(log n) + O(1/n) which evidently asymptotically approaches 1 as n grows. So it is inevitable that the compression ratio will decrease for larger n.

It is not possible to achieve better compression unless not all permutations are equally probable (and you have some information about the probability distribution).

For this specific problem, the most efficient encoding is to view the permutation of [0 .. 2^N-1] as a numeral in the factorial number system and store the Lehmer code for that permutation.

This gives a requirement of ceil(log2((2^N)!)) bits. For N = 4, this uses 45 bits (70.3%); for N = 11 (2^N = 2048), 19581 bits (86.9%).

The compression ratio worsens as N increases; using the simple approximation log x! >= (x log x) - x + 1 we attain a minimum for log2((2^N)!) / (N 2^N) of 1 - ((2^N - 1)/(2^N))*(1 / (N * log(2))), which approaches 1 as N tends to infinity.

Given this absolute bound on compression ratio, any approach you can find which is reasonably efficient is worth going for; for values as small as N = 15 it's impossible to do better than 90%.

Currently you are using N*2^N bits.

Basically what you have is a permutation of the numbers, and each permutation is unique, and for permutation you can calculate a unique identifier. Since there are (2^N)! permutations, you will only need ceil(log2((2^N)!)) bits. For you example, this is 45 bits.

How to compress a sequence of non-repeated number size N bits?

Tags:

Algorithm

C++

Compression

Related

Recent Posts