Lagrange four squares theorem

The set $X$ doesn't have to be the set of non-negative integers. This was known already to Härtter and Zöllner in 1977, who constructed an $X$ of the form $\{ 0, 1, 2, \ldots \} \setminus S $ for an infinite $S$.

For any $\varepsilon>0$, Erdös and Nathanson proved the existence of a set $X$ with $|X \cap [0,n]| = O(n^{\frac{3}{4} +\varepsilon})$, so that already provides an upper bound for your second question.

The problem was essentially settled by Wirsing in 1986, who proved that one has $X$ with $|X \cap [0,n]| = O(n^{1/2}\log^{1/2} n)$. As the lower bound $|X \cap [0,n]| =\Omega(n^{1/2})$ is obvious, this leaves a very small gap for improvement.

Spencer has found a different proof of Wirsing's result.

Other relevant references may be found in the second page of a paper of Vu. Note that most of these proofs are probabilistic.


I know this question has already been sufficiently answered, but I would like to mention an explicit construction of Choi-Erdős-Nathanson, as opposed to the probabilistic proofs that are common in this field. Even though the result is a bit weaker than the results that can be proven probabilistically, it is, in my opinion, still worth sharing.

Theorem (Choi-Erdős-Nathanson). There exists a set of squares $S$ with $|S| < cn^{\frac{1}{3}}\log(n)$ such that every positive integer smaller than or equal $n$ can be written as a sum of at most $4$ elements of $S$.

Proof. Let $x = n^{\frac{1}{3}}$ and define the sequences $a_i = i$, $b_i = \left \lfloor x\sqrt{i} \right \rfloor $ and $c_i = \left \lfloor x\sqrt{i} \right \rfloor -1$. Now let $A = \displaystyle \bigcup_{i=1}^{\left \lfloor 2x \right \rfloor} a_i^2$, $B = \displaystyle \bigcup_{i = 4}^{\left \lfloor x \right \rfloor} b_i^2$, $C = \displaystyle \bigcup_{i = 4}^{\left \lfloor x \right \rfloor} c_i^2$. Then we first aim to prove that every positive integer $m$ with $m \le n$ and $m \not \equiv 0 \pmod{4}$ can be written as a sum of at most $4$ elements of $D = A \cup B \cup C$, and note that $|D| < 4x = 4n^{\frac{1}{3}}$.

From Lagrange's theorem it follows that if $m \le 4x^2$, then $m$ is the sum of at most $4$ elements of $A$, so we may assume $4x^2 < m \le n$. With $k$ defined as $k = \left \lfloor \frac{m}{x^2} \right \rfloor$, it is clear that $4 \le k \le x$. If we let $d$ be equal to $\left \lfloor x\sqrt{k} \right \rfloor$, then $d^2 \in B \subset D$ and $(d-1)^2 \in C \subset D$. By Gauss' Theorem on sums of three squares, either $m - d^2$ or $m - (d-1)^2$ can be written as the sum of three squares, since $m \not \equiv 0 \pmod{4}$. Moreover, $m - (d-1)^2 < (k+1)x^2 - (x\sqrt{k}-2)^2 < 4x^2$ for $x$ large enough (and the small values can easily be checked by hand). So we conclude that either $m - d^2$ of $m - (d-1)^2$ can be written as the sum of three elements of $A$.

Now all we need to deal with are the integers $m \le n$ such that $m \equiv 0 \pmod{4}$. But then we can write $m = 4^jm'$ with $j \le \frac{\log(n)}{\log(4)}$ and where $m'$ can be written as the sum of at most $4$ elements of $D$. So define $S_i = \{4^i d | d \in D\}$ and $S = \displaystyle \bigcup_{i=0}^{\left \lfloor \frac{\log(n)}{\log(4)}\right \rfloor} S_i$. Then $|S| < 4n^{\frac{1}{3}}\left(\frac{\log(n)}{\log(4)} + 1\right)$ and every positive integer smaller than or equal to $n$ is the sum of at most $4$ elements of $S$.


Back to my original suggestion, it appears that taking $X$ to be numbers with at most four prime factors is (basically) enough. Indeed, the following paper due to Tsang and Zhao show the following: every sufficiently large integer $N \equiv 4 \pmod{24}$ can be written as $x_1^2 + x_2^2 + x_3^2 + x_4^2 = N$, with $x_i \in P_4$ for $i = 1,2,3,4$. Here $P_r$ is the set of numbers with at most $r$ prime factors. I am guessing one only needs to take $r$ slightly larger, and allowing for possibly non-primitive representations, to cover the remaining congruence classes.