information-theoretic derivation of the prime number theorem

You may be interested in this arxiv paper [1], "Some information-theoretic computations related to the distribution of prime numbers", Ioannis Kontoyiannis, 2007.

It discusses Chebyshev's 1852 result,

$$ \sum_{p \leq n} \frac{\log p}{p} \sim \log n , $$

which is related to the prime number theorem and used in proofs of it. The paper begins by sketching Billingsley's 1973 heuristic information-theoretic argument, then makes it rigorous. The heuristic is similar to yours: Suppose we uniquely represent a number $N = \prod_{p \leq N} p^{X_p}$. Then we pick $N$ uniformly at random in $\{1,\dots,n\}$, with an entropy of $\log(n)$. The information contained in the $\{X_p\}$ variables in the same; so that collection has the same entropy. Then there is an argument that the $\{X_p\}$ are approximately geometrically distributed. This winds up giving the left side. (Note we could take all the logs to any base without changing the claim, since the change cancels out on both sides.)

[1] https://arxiv.org/abs/0710.4076


The following argument seems related in spirit (though it shows far less), but may be of independent interest. Let $X$, $N$, etc, be as you defined. Then $X = p_1^{E_1}\cdots p_k^{E_k}$ where the $p_i$ are the primes and the $E_i$ are (quite correlated) random variables, and $k = \pi(N)$. We then have $H(X) = H(E_1,\cdots,E_k) \leq \sum_i H(E_i)$, since neglecting correlations increases entropy. Now, $E_i\leq\log_2N$, so its entropy is at most $\log_2\log_2N$. This means $\pi(N)\geq\log_2N/\log_2\log_2N$. This provides an information-theoretic proof of the infinitude of primes, but the estimate falls far, far short of the PNT.

References: I had found a version of this back in grad school but never wrote it up. But others have thought about this too. You'll find this argument and a bit more here: https://www.dpmms.cam.ac.uk/~ik355/PAPERS/itw-talk.pdf