Motivated account of the prime number theorem and related topics

To a certain extent, I think that analytic number theory really is magical, and there's a limit to how natural and motivated it can be. Of the accounts I have seen, the one in Donald Newman's book Analytic Number Theory comes the closest to helping you see how you might have come up with the key ideas, but even Newman occasionally pulls things out of a hat (in my opinion).

Having said that, I think that there are a few guiding principles that you can keep in mind as you try to learn this material. The first piece of magic has to be the Euler product $$\zeta(s) = \sum_{n\ge 1} {1\over n^s} = \prod_p {1 \over 1 - {\displaystyle 1\over \displaystyle p^s}}.$$ Though the proof of this formula is easy, I think it is no accident that it was someone of Euler's caliber who dreamed it up in the first place. It provides a crucial link between a discrete and somewhat chaotic-looking set (the primes) and a smooth function of the complex numbers.

Once you accept that $\zeta(s)$ is relevant to understanding the primes, even if you don't understand yet just how that is going to work, then it should make sense that you should try to understand $\zeta(s)$ thoroughly. A second piece of magic enters in, which is that holomorphic/meromorphic functions have strong uniqueness properties, and in particular if you can extend $\zeta(s)$ to a meromorphic function on the entire complex plane, then there is a unique way to do so. This is in stark contrast to real analysis. It may require ingenuity to see that $\zeta(s)$ can indeed be extended in this way, and it may be a miracle that it satisfies a nice functional equation, but since we know that the end result is canonical, we can employ whatever means necessary to analytically continue $\zeta(s)$, without worrying that we're introducing arbitrary choices of our own making.

Infinite sums are usually more tractable than infinite products, so that motivates studying $\log \zeta(s)$. This is going to blow up at the zeros, so if we want to understand limiting/asymptotic behavior, we have to understand the zeros. A third piece of magic enters here, namely the Cauchy integral formula and the residue theorem. In analysis we are always interested in power series expansions, and from our calculus classes we might think that the way to get at power series coefficients is to differentiate (Taylor series). But a basic principle of analysis is that differentiation tends to be less well-behaved than integration (because integration keeps small quantities small, while differentiation might not), and so if you can get at power series coefficients using integration rather than differentiation, then you'll generally prefer that.

Hopefully this at least explains in outline why complex analysis enters the picture. It's natural to wonder whether all the stuff about meromorphic continuation and the Cauchy integral formula is just a convenience or whether it's really necessary. The fact that to this day, the only "elementary" proofs of the prime number theorem are even more mysterious than the complex-analytic proofs suggests that complex analysis is in some sense the right approach to the subject, even if it is not strictly necessary for the prime number theorem itself.

Possibly you're aware of all the above already, but are still finding it hard to wade your way through all the actual calculations with series and contour integrals and estimates. It may be helpful to consult Edwards's book on the Riemann zeta function. Edwards presents a translation of Riemann's original paper and walks you through it, using the benefit of modern knowledge. Riemann's paper is short and he was introducing highly original ideas for the first time, so you can really see the outline of the main ideas. Riemann's paper isn't easy to read in isolation, but I think that with Edwards's accompanying exposition, it can really help you see the big picture. EDIT: One additional remark—a beautiful feature of Riemann's work is that he gives an exact formula for the prime-counting function in terms of the nontrivial zeros of $\zeta(s)$. Especially if you're more comfortable with "soft" analysis than "hard" analysis, this gives you a clean conceptual picture of the relationship between counting primes and the zeros of $\zeta(s)$, without any need to get your hands dirty with concrete estimates and approximations.


ADDENDUM. The answer by Kostya_I inspired me to make some additional remarks about the following basic question, which is often not stressed in elementary courses but then is taken for granted in more advanced texts:

What do asymptotics have to do with singularities?

As Kostya_I suggests, it is helpful to look at the simpler case of Taylor series first. Note that the Taylor coefficients of $1/(1-z)$ (expanded around $z=0$) are constant, the coefficients of $1/(1-z)^2$ grow linearly, and the coefficients of $\log 1/(1-z)$ grown like $1/n$. These simple examples illustrate that if you have a Taylor series around $z=0$ that has a radius of convergence of $1$ and that has a unique singularity on the boundary $|z|=1$ (in these examples, at $z=1$), then the behavior of the singularity controls the asymptotic growth of the Taylor coefficients. If there are multiple singularities on the boundary then you potentially have to consider them all. This relationship is explained in great detail in (for example) Flajolet and Sedgewick's wonderful book Analytic Combinatorics, where they show that the asymptotic enumeration of many combinatorial objects can be accomplished by writing down an ordinary or exponential generating function for them, and analyzing the singularities.

In analytic number theory, one is also interested in generating functions, but now the most relevant generating functions are Dirichlet series: $$\sum_{n\ge1} \frac{a_n}{n^z}.$$ The main reason for using Dirichlet series rather than Taylor series is that if the sequence $a_n$ is (completely) multiplicative, meaning that $a_m\cdot a_n = a_{mn}$, then we can write down an Euler product formula. Now, you might worry that Dirichlet series and Taylor series are radically different and that the theory of Dirichlet series has to be developed from scratch, but in fact some of the basic theory follows exactly the same lines. This is made clear by (for example) Serre's treatment in his Course in Arithmetic, where he defines a generalized Dirichlet series as follows: $$ \sum_{n\ge 1} a_n \exp(-\lambda_n z).$$ If we set $\lambda_n := \log n$ then we recover ordinary Dirichlet series, but if we set $\lambda_n := n$ and $x:=\exp(-z)$ then we recover Taylor series. By considering generalized Dirichlet series, Serre thereby gives a uniform proof of the basic facts about the domain of convergence. The Dirichlet analogue of a disc with a singularity at the radius of convergence is a half-plane of convergence with a singularity on the boundary line $\Re z = a$ for some $a$.

The point is that the Dirichlet series for the Riemann zeta function $\zeta(z)$ obviously has a singularity at $z=1$, and so if you are used to the idea that the asymptotics are governed by the behavior at the boundary of the natural domain of convergence, then it becomes natural to expect that, to first order, the asymptotic behavior of the primes should have something to do with the behavior of $\zeta(z)$ on the line $\Re z = 1$. There is an extra twist, because the Euler product invites us to take the logarithm (or the derivative of the logarithm, which turns out to be easier to work with), and hence the crucial question is whether $\zeta(z)$ vanishes on $\Re z = 1$ (rather than whether it has any other singularities on that line).

One final comment is that Dirichlet's theorem on primes in arithmetic progression may be an easier example to understand than the prime number theorem. At first glance it might look more complicated because now you have to absorb the definition of a complex character and of an $L$-function, but if your background is in algebra then that should be no sweat. The key thing is that the analytic part of the argument is easier, because it turns out that all you need to prove is that the $L$-functions are nonvanishing at the point $z=1$ (rather than on a whole line). Historically, Dirichlet's proof preceded the prime number theorem, and even preceded Riemann's famous memoir on the zeta function, so that is perhaps some indirect evidence that it's an easier theorem. Again, this is all proved in Serre's Course in Arithmetic. If you want to understand the more complicated case of the prime number theorem and of Riemann's (or von Mangoldt's) exact formula in terms of the zeros of $\zeta(z)$ in the critical strip, then you'll have to look elsewhere (such as Edwards's book), but hopefully by that time the motivation for the calculations will be clearer.


You might like the short (150 page) book by Mazur and Stein:

Prime Numbers and the Riemann Hypothesis, Barry Mazur, William Stein, Cambridge University Press, 2016.

The discussion is definitely algebraic and geometric.


Let me record a pedestrian answer here. It all starts with Euler's formula $$ \prod_p\left(1-\frac{1}{p^s}\right)^{-1}=\sum_{n\geq 1}\frac{1}{n^s}=:\zeta(s),\quad s>1, $$ and the observation that since the RHS diverges for $s=1$, so does the LHS, and thus $\sum_p p^{-1}=+\infty$. This is already interesting, since it proves the infinitude of primes, and what is more, shows that the primes are rather dense. Indeed, let us assume for a moment that $p_n\sim cn^\alpha(\log n)^\beta$ for some $\alpha\geq 1$ and $\beta\in \mathbb{R}$, where $p_n$ denotes the $n$-th prime. The prime number theorem is equivalent to the statement that $c=\alpha=\beta=1$. The above observation, combined with $p_n\geq n$, restricts the exponents to $\alpha=1$, $0\leq\beta\leq 1$.

Can we do better? Well, a natural idea would be to plug various candidate expansions into the LHS of Euler's identity and check what's compatible with the behaviour of the RHS as $s\to 1$. The behaviour of $\sum_n n^{-s}$ is easy to understand: we can approximate the sum with the integral $\int_1^\infty x^{-s}dx=\frac{1}{s-1}$; the error will give a convergent series for $s\geq \frac{1}{2}$. Thus, $\zeta(s)=(s-1)^{-1}+O(1)$ as $s\to 1$.

To understand the asymptotics of the LHS, it is convenient to pass from products to sums: $$ -\sum_n{\log\left(1-\frac{1}{p_n^s}\right)}=\sum_n\frac{1}{p^s_n}+O(1)=\log(s-1)+O(1),\quad s\to 1. $$ Plugging in the candidate expansion for $p_n$, we arrive at $$ \sum_n \frac{1}{n^s(\log n)^{s\beta}}\sim c\log(s-1),\quad s\to 1. $$ It is a nice elementary exercise to check that the left-hand side is asymptotic to $(s-1)^{\beta-1}\Gamma(1-\beta)$ for $\beta<1$, and to $\log (s-1)$ for $\beta=1$. We conclude that we must have $\beta=1$ and $c=1$. (The result we have just proven is due to Chebyshev. Note that no complex analysis was needed thus far.)

Of course, in order to prove the PNT we need to remove the a priori assumption $p_n\sim cn^\alpha(\log n)^\beta$, and for that the information about the $s\to 1$ regime is not sufficient. Consider the following toy problem:

Compute the asymptotics of the Taylor coefficients of $\exp(x)(1-x)^{-2}$ at the origin.

An argument similar to the one above would say that of all "natural" asymptotics, only $a_n\sim e n$ is compatible with the behaviour as $x\to 1$. However, the function $(1+x^2)^{-2}+e(1-x)^{-2}$ has the same behaviour as $x\to 1$, but a completely different asymptotics of the coefficients. Of course, the reason here is that there are other singularities on the unit circle. In this problem, the role of complex variable and the domain of analyticity is very transparent.

The PNT is not about Taylor coefficients, but it can be reduced to a problem about the growth of a Fourier transform of an analytic function. To this end, we differentiate the log of Euler's identity (conveniently, it will get rid of logs and make our functions single-valued) to get $$ -\frac{\zeta'(z)}{\zeta(z)}=\sum_p \frac{\log p}{p^z-1}. $$ A natural simplification is to get rid of the $-1$ in the denominator; so, instead of the RHS, we consider $\Phi(z):=\sum_p(\log p)p^{-z}.$ (The difference is analytic for $\Re z>\frac12$, and so it will be unimportant.) Now, $\Phi$ is almost manifestly a Laplace transform: consider the distribution $$\rho(x):=(\log(x))\sum_p\delta(x-p),$$ integrate by parts and change variable: $$ \Phi(z)=\int_1^\infty\rho(x)x^{-z}dx=z\int_1^\infty\theta(x)x^{-z-1}dx=z\int_0^\infty\frac{\theta(e^y)}{e^y}e^{-(z-1)y}dy, $$ where $\theta(x)=\int_1^x \rho(x)=\sum_{p\leq x}\log p$ and $\Re (z-1)>0$. Note that it is elementary to check that PNT is equivalent to $\theta(x)\sim x$, so, we can work with $\theta$, but there are other ways to write $\Phi$ as a Laplace transform, including ones directly involving $\pi(x)$.

In Fourier analysis, there's a general principle that the rate of decay of a function at infinity corresponds to smoothness (or analyticity) of its Fourier transform. In particular, the Fourier transform being analytic in the $\epsilon$-neighbourhood of the real line corresponds to the function decaying as $O(\exp(-\epsilon |x|))$, give or take a bit of room. (In our case, the role of the real line is played by $\Re z=1$; translate and rotate to get into the more familiar set-up.) Now, $\Phi(z)$ has a pole at $z=1$, which just corresponds to the purported limit $\lim \theta(x)/x=1$: we have $$ \frac{\Phi(z)}{z}-\frac{1}{z-1}=\int_0^\infty\left(\frac{\theta(e^y)}{e^y}-1\right)e^{-y(z-1)}dy. $$ Also, $\Phi$ is automatically analytic to the right of $\Re z=1$ (due to $\theta$ vanishing identically on the negative axis). If we knew that the left-hand side of the above identity extended analytically to $1-\epsilon<\Re z\leq 1$, then this would mean that $$ \frac{\theta(e^{y})}{e^y}-1=O(e^{-y\epsilon'}) $$ for any $\epsilon'<\epsilon$, that is, $\theta(x)=x+O(x^{1-\epsilon'})$, which is the PNT with an error bound (you can get a bit sharper estimate here). Also, you see that if $\Phi$ had other singularities on the line $\Re z=0$, then $\theta$ would have a different asymptotics.

Unfortunately, no-one knows how to prove the above analytic extension, so the rest of the story boils down to showing that there are no singularities on the line $\Re z=1$ itself, and a subtle analytical argument that this is just enough to conclude the PNT without error bounds, or proving that some explicit region around that line is free of zeros, which yields a (sub-power) error bound. The fact that there are no zeros on $\Re z$ is heuristically explained as follows: for $\prod(1-p^{-1+it})^{-1}$ to diverge to zero, the factors $p^{it}$ must conspire in such a way that the "majority" of them points in the negative direction. But then the "majority" of $p^{2it}$ points in the positive direction, meaning that $\prod(1-p^{-1+2it})^{-1}$ diverges to infinity and $\zeta$ has a pole at $1+2it$, which we know it does not. This heuristic is usually packed into an ingenious one-liner which is hard to motivate further. A slick version of the analytic lemma is in Newman's proof, which is the source of most of the material of this answer.