Does data compression require energy?

Your question is difficult to answer because hard-drives are not designed to have zero energy loss. Thus the hardware you describe cannot possibly get close to the lower energy bounds you want to talk about. So in general the answer is "yes, it takes energy." One can look at what would have to happen in order for there not to be energy used.

First off, your computer would have to be a reversible computer, and we would need a zero energy way of physically permuting the states. In practice, because harddrives operate at a temperature higher than absolute zero, if it took zero energy to permute the states, then thermal noise would permute them, rendering the data scrambled shortly after writing.

But suppose we worked with that. While I do not believe there is any theoretical device that would work the way you need it to work, we can handwave that for a moment. Now you have a problem. For this data to be "classical," where bits are either 0 or 1 and not a superposition of both, we are going to need to "read" the permutation out of the reversible computer. This is where you would see a $k_BT\ln 2$ term show up. You would need to read that many bits of information out of the reversible computer. The actual compression process might have been free because it was reversable, but the final output was a classical measurement.

To get rid of that, we would need to have the "hard drive" designed to operate in a quantum sense as well. In this case, the reversible computer would be coupled to the harddrive in a way which renders the qbits on the harddrive to be in a state which, if observed, have basically a zero probability of being measured in the wrong state. However, in doing so, you would have to ensure you don't erase the original state. This means the unknowns of what was on the harddrive in the first place would need to be stored in the system (probably in the computer part). What you would have described is basically qbits of memory, and they exits until the system falls out of coherence.

If I skip ahead one: what physically happens to $\sigma$? Nothing. It was never a physical thing in the first place.

Finally, the in practice question. The answer is "a whole lot more." Modern computers are so far from ideal that it is almost pointless to try to compare them. For this, I downloaded a copy of the complete works of Shakespere. I made 192 copies of this, which was very close to a gigabyte, and ran them through the Linux command "time tar -cz". I did this several times, and it came in at roughly 1 minute each time (and for the curious, it compressed it to 37.8% of its size).

What are the takaways here?

  • This pegged one of my CPUs at 100%. I could have parallelized it, but that was overkill. I know that, in practice, harddrives are much slower than CPUs, so I think there's a good chance that my computer held the entire file in memory... because we pay the OS developers to do smart things like that.
  • An Arduino would be far slower than my computer. I'm not going to post my specs, because this isn't a benchmark - its a theoretical computation gone wrong. But it does suggest that I would have to spend 17 hours compressing 1TB of text via this method. That's a lot of bicycling.
  • My CPU pulls about 16W while compressing this data (2.4W at idle, for those who are curious). That's less than the whole computer, and a whole computer is needed for this process, but I don't have my ammeter hooked up at the moment, so I couldn't measure that.

So what was the theoretical power usage? $k_BTln2$. At 315K, and flipping on average 190,000,000 bits, that's $8\cdot10^{13} \text J$. What was my actual power usage? 960W. So currently my computer is 1,000,000,000,000,000 less efficient than the ideal!


In theory it takes absolutely zero energy to permute bits around, as long as you use reversible computation. Landauer's limit only applies to irreversible processes where you can't reconstruct the input from the output, such as performing AND or OR gates, or erasing bits whose values were originally unknown.

Permutations are reversible operations, so they don't have to cost any energy. There are many examples of reversible computation, such as the billiard ball computer, where you can see this explicitly, though they're all extremely impractical. In a real CPU, permutations are implemented through a series of irreversible operations, so they would necessarily cost energy. But the actual energy needed exceeds the Landauer limit by many orders of magnitude.