How to find sequence of digits in pi?

The project you've found is a (deliberate!) joke.

It is true that $\pi$ is suspected to be normal in all bases, which would imply that every finite sequence of hex digits appears somewhere (indeed, many times) in the hexadecimal expansion of $\pi$.

But this cannot be used for compression -- the trouble is that the number $A$ that tells you where to find your file in $\pi$ will -- in the vast majority of cases -- be so large that storing $A$ takes up even more space than it would take to store the original file.

The BBP formula is not particularly suited for finding a particular sequence in $\pi$, except by trial and error, or by starting somewhere in $\pi$ and keep producing digits until you randomly come across the sequence you're looking for. So at first, the goal of the project would be completely impossible -- just locating a ten-byte file would take lifetimes. (That is, some multiple of the lifetime of the universe).

The kicker is in this part of the description:

Now, we all know that it can take a while to find a long sequence of digits in $\pi$ so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in $\pi$.

So it doesn't actually "compress" anything -- it just stores, for each byte in the file, a position in $\pi$ where that particular byte can be found. (And finding such short a segment is certainly doable by brute force). But storing such a position takes more than a byte. So all in all it's just a simple substitution code, with a particularly inefficient implementation.

(And then there's some further joking around, claiming that the indices don't count as space used because they're "metadata".)


The Bailey-Borwein-Plouffle formula does not allow you to find a desired sequence of digits in $\pi$. As the Wikipedia page says, it allows you to find the hex digits starting as a desired place without calculating the preceding ones. So if you want the digits of $\pi$ starting at the billionth, this is your friend. This would be used in the decryption step.

It is not proven that every digit sequence appears in $\pi$, but it is likely. I am not aware of any way besides brute force to find were a given sequence occurs. The problem with the idea is that the index for your file is likely to be as long as the file. Yes, instead of storing your file you store the index, but it is just as large and harder to use. Suppose you have a file that is $1000$ hex digits long. There are $16^{1000}$ sequences of $1000$ hex digits, so you would expect your string to occur somewhere around position $16^{1000}$, which takes $1000$ hex digits to store. At this page you can search the first $200,000,000$ decimal digits of $\pi$ for a desired string. If you look for $12345678$, it reports that it occurs at position $186557266$, which is $9$ digits instead of $8$

Tags:

Pi