Philosophy behind Yitang Zhang's work on the Twin Primes Conjecture

My understanding of this, which is essentially cobbled together from the various news accounts, is as follows:

Let $\pi(x;q,a)$ denote the number of primes less than $x$ congruent to $a\bmod q$, and $\pi(x)$ the number of primes less than $x$. $\phi(n)$ is the number of positive integers less than or equal to n that are relatively prime to n, and is usually referred to as Euler's totient function. Denote by $EH(\theta)$ the assertion that $\forall A > 0, \exists C > 0, \forall x > 2$, the following inequality holds:

$$\sum_{1\leq q \leq x^{\theta}} \max_{\gcd(a,q)=1} \left| \pi(x;q,a) - \frac{\pi(x)}{\phi(q)} \right| \ll \frac{x}{\log^A(x)} (*)$$ for all large $x$.

The Bombieri–Vinogradov theorem asserts that EH($\theta$) holds for $\theta <1/2$, and the Elliot–Halberstam conjecture asserts that EH($\theta$) holds for all $\theta<1$.

In the mid 2000's Goldston, Pintz and Yildirim proved that if the Elliott–Halberstam conjecture holds for any level of distribution $\theta>1/2$ then one has infinitely many bounded prime gaps (where the size of the gap is a function of $\theta$, for $\theta>.971$ they get a gap of size 16). Since Bombieri–Vinogradov tells us that $EH(\theta)$ holds for $\theta < 1/2$, in some sense the Goldston–Pintz–Yildirim arguments just barely miss giving bounded gaps.

On the other hand, in the 1980s Fouvry and Iwaniec were able to push the level of distribution in the Bombieri–Vinogradov theorem above $1/2$ at the expense of (1) removing the absolute values, (2) removing the maximum over residue classes and, (3) weighting the summand by a `well-factorable' function. This was subsequently improved in a series of papers by Bombieri, Friedlander and Iwaniec. For the Goldston–Pintz–Yildirim argument the first two restrictions didn't pose a significant hurdle, however the inclusion of the well-factorable weight appeared to prevent one from using it with the Goldston–Pintz–Yildirim machinery.

This is where my knowledge becomes extremely spotty, but my understanding is that Zhang replaces the Goldston–Pintz–Yildirim sieve with a slightly less efficient one. While less efficient, this modification allows him some additional flexibility. He then is able to divide up the quantity that is estimated using (*) into several parts. Some of these are handled using well-established techniques, however in the most interesting range of parameters he is able to reduce the problem to something that involves a well-factorable function (or something similar) which he ultimately handles with arguments similar to those of Bombieri, Friedlander and Iwaniec.

Zhang's argument reportedly gives a gap size that is around 70 million. My suspicion is that this gap will quickly decrease. In their theorem, Bombieri, Friedlander and Iwaniec give a level of distribution around $4/7$, where (according to these notes) Zhang seems to be working with a level of distribution of the form $1/2 + \delta$ for $\delta$ on the order of $1/1000$, so there is likely room for optimization. As a reference point if one had the level of distribution $55/100$ ($< 4/7$) in the unmodified Elliot–Halberstram conjecture (without the well-factorable weight), the Goldston, Pintz and Yildirim gives infinitely many gaps of size less than 2956.

The parity problem, however, is widely known to prevent approaches of this form (at least on their own) from getting a gap smaller than length 6. Of course, there are more technical obstructions as well and even on the full Elliott–Halberstam conjecture one can only get a gap of 16 at present.


The preprint is now available.

http://annals.math.princeton.edu/wp-content/uploads/YitangZhang.pdf


What new approach did Yitang Zhang try & what did the experts miss in the first place?

Yes it is a good question as to why (say) FI did not hit upon such a result, as the two major glue components, dispersion a la BFI, and beating the square-root barrier akin as Friedlander/Iwaniec, are due to them. As Zhang puts it, last paragraph section 2 page 7, he saves a $\sqrt r$ factor in a Kloostermann type bound, and can take $r$ as a small power (well, I think he should take it larger than he does, after review).

Most everything else is the paper is rather "standard", as the idea of restricting to integers/moduli which have no small/large factors is common, the dispersion method is in BFI, maybe in S10 Zhang has to work a little to get his conditions on congruence classes to roll through. But then the Type III estimate (trilinear) in 13 and 14 is the heart, of why he wins. Indeed, he handles each $d$ separately, rather than on average (as dispersion or large sieve). Conceptually again he brings forth a Weyl shift to copy a sum many times, then turning to Fourier techniques. The extra flexibility of factoring $d=qr$ is not distressed, for he uses Chinese Remainder Theorem in repair, and in fact the Ramunajan sums pop out too. See bottom page 52, at 14.13 and below to top line page 53, where the Birch-Bombieri bound is applied. See in $J(m_1,m_2)$, the part from $r$ is a Ramanujan sum, so the double $r$-sum is bounded in essence by $r$, not $r^{3/2}$. Again the preceeding Fourier analysis has technique, but the idea is already bookish.

So I repeat, the main advance was not to apply Deligne to something like $Z(k;m_1,m_2)$ below 14.12 on page 52 directly on modulus $d$, but to first peel over factor, small perhaps but useful, as $d=qr$ leaving $q$ to geometry and $r$ to Ramanujan after spinning out the $N_3'$ sum over this modulus. Well, that is my word, I don't say I understand all, not even philosophically why this line should give the win.

The Friedlander/Iwaniec paper: http://www.jstor.org/stable/1971175

PS. It could say useful to have a rewrite of 13 and 14, independent of the rest of the work, for he chooses parameters there for his purpose, but this vitiates the generality. They are independent for the whole.

Or again, look at $P_2$ near bottom page 48. This should be of size $d_1^{3/2}K^2$ if you apply direct reasoning, but Zhang first rolls out $n$ modulo $r$. It is still mysterious for me why this avails, factoring $V$ into $W\cdot C$ in 14.7 the latter a Ramanujan.

Let me try again, and briefly sketch the whole idea of Sections 13 and 14.

Zhang wants to estimate $|\Delta(\gamma,d,c)|$ where $\gamma=\alpha\star\chi_{N_3}\star\chi_{N_2}\star\chi_{N_1}$ is a triple convolution with $N_1\ge N_2\ge N_3$ of decent size, say $N_3\ge x^{1/4-6\omega}$. The characteristic functions $\chi_N$ are for an interval say $N$ to $N+N/(\log N)^B$. Here we have the standard discrepancy $\Delta$ defined as $$\Delta(\gamma,d,c)=\sum_{n\sim x\atop n\equiv c (d)}\gamma(n) -{1\over\phi(d)}\sum_{n\sim x\atop (n,d)=1}\gamma(n).$$ Most importantly, $d=qr$ with $r$ convenient. The estimate $|\Delta(\gamma,d,c)|\ll x^{1-\kappa}/d$ is desired for some positive $\kappa$.

Zhang replaces $\chi_{N_1}$ by a smooth approximant. This is standard, there are various versions of this in the field, the idea being that if a function goes from 0 to 1 over an interval of length $Y$, you can control its derivatives as powers of $Y$. Then one executes the inner sum in $\Delta$ over this variable, and replaces the sum over the smoothed function by its Fourier transform. This allows the main terms in $\Delta$ to cancel from the frequency 0 contribution, leaving a deal with highers. See middle page 45 and following. Copying, $$\sum_{n\equiv c (d)}\gamma^\star(n)=\sum_{(m,d)=1}\alpha(m) \sum_{n_3\sim N_3\atop (n_3,d)=1}\sum_{n_2\sim N_2\atop (n_2,d)=1}\sum_{mn_3n_2n_1\equiv c (d)}f(n_1),$$ and the inner sum is $${1\over d}\sum_{|h|\le H} \hat f(h/d)e_d(-ch\overline{mn_3n_2})+O()$$ where $H=d/N_1$ essentially. The sum over $m$ handles itself, the inner part will yield the cancel.

So Zhang's goal is to obtain the estimate $$\sum_{1\le h\le H}\sum_{n_3\sim N_3\atop (n_3,d)=1}\sum_{n_2\sim N_2\atop(n_2,d)=1}\hat f(h/d)e_d(-ch\overline{mn_2n_3})\ll x^{1-\kappa}/M.$$ In fact to use a Möbius inversion device that I omit below, we need this for $d$ not just near $\sqrt x$ (beyond the Bombieri-Vinogradov range), but also divisors of $d$ so a wider range. When $d$ is small enough, or $N_2$ large enough to say another way, a one-variable estimate suffices. Actually the trickiest case is when $N_1\sim N_2\sim N_3\sim x^{5/16-9\omega/2}$ (maybe this is not the exact $\omega$ multiplier, but 5/16 is right) and $d\sim x^{5/12-9\omega}$, neither a 1-variable method or the ensuing, is going to swell. But when $d\le N_1$ the estimate for the $H$-sum is empty, and when $d^{3/2}N_3\ll x^{1-\kappa}/M$ a 1-variable bound on $n_2$ suffices. I digress. In fact, as noted per (2.4) in Friedlander/Iwaniec, a 2-variable bound can be applied from Deligne some times (the inner double sum then bounded by $\sqrt{d^2}$). Zhang does not do this extra, I find it gives a suitable bound when $d^2\ll x^{1-\kappa}/M$.

Back to the main story. After a application of Möbius to insert a coprimality condition in the frequency variable $h$, Zhang then uses the idea of the Weyl shift. It is key that he shifts by multiples of $r$, this convenient factor of $d$. The idea of the Weyl shift is to copy a sum many times, only partially shifted by much less than its length. In the above, Zhang replaces $n_2$ by $n_2+hkr$ for $k$ up to some bound $K$, and then computes that the difference between the sum and the $hkr$-shifted sum is small (provided $K$ small enough of course). Then one wants to bound the average over the shifts $$N(d,k)=\sum_{1\le h\le H\atop (h,d)=1}\sum_{n_3\sim N_3\atop (n_3,d)=1}\sum_{n_2\sim N_2\atop(n_2+hkr,d)=1}\hat f(h/d)e_d(-ch\overline{m(n_2+hkr)n_3}).$$ I simplified the formula on top of page 47 a bit, not including the Möbius step, the above is not truly correct, but an idea of how it goes self-contained here. To state again, the idea is that we want to bound $N(d,0)$, we know that $N(d,k)$ is close to $N(d,0)$ for small $k$, and will establish a bound on average for ${1\over K}\sum_{k\sim K} N(d,k)$.

From Cauchy, and substituting $l\equiv \bar hn_2$ modulo $d$, one is left to estimate ($P_2$ of bottom page 48) $$\sum_{l (d)}|\sum_{k\sim K\atop (l+kr,d)=1}\sum_{n\sim N_3\atop (n,d)=1} e_d(b\overline{(l+kr)n})|^2.$$ Staring at this, if you expand, the $l$-sum is over $d$ and the $n$-sums are essentially incomplete sums modulo the same, so expect $d^{3/2}K^2$. But the shift by a multiple of $r$ will allow us to win. The idea is that $N_3$ exceeds $r$ by enough to allow sprawling $n=rn'+s$ over residue classes modulo $r$, and this to be efficient. Now normally, this should not gain, see bottom page 49 with 14.6, Zhang wants to estimate $$\sum_{k_1\sim K}\sum_{k_2\sim K}\sum_{s_1\le r\atop (s_1,r)=1}\sum_{s_2\le r\atop (s_2,r)=1}\sum_{n_1\sim N_3/r}\sum_{n_2\sim N_3/r}\sum_{l (d)} e_d(b\overline{l(n_1r+s_1)}-b\overline{(l+kr)(n_2r+s_2)}).$$ Again one doesn't expect to win, as though the $e_d$ will factor over $d=qr$ to $e_q()e_r()$, there will still be three sums over a variable modulo $r$, and $r^{3/2}$ shall appear. But the idea is that the fact the Weyl shift was a multiple of $r$, so upon unwinding the CRT, the triple sum modulo $r$ is really a double sum of a Ramanujan sum. So that's why I typed out the innards of $e_d$ above.

Construing the technicalities with Fourier transforms, this results as needed. The key is that $r$ can be taken as a small power of $x$, and we win by $\sqrt r$, or really the fourth-root after Cauchy, but this is enough.