There are between 1 and N about \sqrt{N} perfect squares, which makes this set S\subset [N] a quite sparse set, of density N^{-1/2}. Given that \lvert S\rvert^2 \sim N, it could be reasonable to think that S+S, with some luck, cover all of [N] or at least have positive density. In fact it’s not the case.

To estimate the size of S+S, notice first that it contains all the primes congruent to 1 mod 4, which is a well-known theorem of Fermat, which is about one half of the primes, so that \lvert 2S\rvert \gg \frac{N}{2\log N}. This already a huge lot bigger than the squares alone.

In fact there are much more sums of two squares than these primes, because all their products are also sums of two squares. In fact, it contains all the products of such primes. A similar heuristics to the one leading to the prime number theorem suggests that the density of the numbers having no prime factor congruent to 3 mod 4 should be

\prod_{\substack{p\equiv 3[4]\\p\leq N}} (1-1/p), and the log of this product should be roughly

-\sum_{\substack{p\equiv 3[4]\\p\leq N}}1/p\sim -1/2\log\log N

so we get a density  \frac{\lvert S+S\rvert}{N}\sim 1/\sqrt{\log N}. While this indeed is no proof, it should be believable. In general, a number n is a sum of two squares if and only if it is of the form n=ab^2 where a is product of primes congruent to 1 mod 4 and b is a product of primes congruent to 3 mod 4. Thus, the total number of sums of two squares seem to be bounded above (using the union bound) by something like

\sum'_b \lvert 2S\cap [N/b^2]\rvert

where the \sum' denotes a sum restricted over number whose prime factors are all 3 mod 4. But this should be of the order of

\sum'_b \frac{N/b^2}{\sqrt{\log N}}\leq \sum_b1/b^2N/\sqrt{\log N}

and the sum is bounded, so certainly \lvert 2S\rvert\leq C\frac{N}{\sqrt{\log N}}. In fact, Landau has proven in 1908 that the number of sums of two squares until capital N is asymptotic to

\frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2}\frac{N}{\sqrt{\log N}}

which fits quite well with what the heuristics indicate (the product over the restricted primes corresponds perfectly to the restricted sum).

Repartition of the sums of squares in residue classes

We see that the primes are analogous to the sums of two squares: indeed, the primes are the numbers which have no  factor, and the sums of two squares are the numbers which have no factor congruent to 3 mod 4 (plus other ones of course, the products of such numbers by 9, 49, 121 etc. of which we saw that it simply multiplied the size of the set by a constant). The same heuristic provides a density which then turns out to be correct. So why not, as Dirichlet did for the primes, ask for the repartition of the sums of squares in residue classes?

A sum of two squares n=a^2+b^2 can only be congruent to 0,1 or 2 mod 4, because squares are 0 or 1 mod 4. Let’s show that these values are not equiprobable in fact. First notice that n can be written as n=2^kn' where n'\equiv 1 [4] is also a sum of two squares. So given the number k_1 of n\equiv 1[4] until N which are sums of two squares, we get k_2=1/2 k_1 and k_0=(1/4+1/8+\cdots)k_1=1/2 k_1. At least, the probabilities of being even or odd are the same. Another instance is that the sums of two squares are twice as dense in n\equiv 1[4] than in n\equiv 1[2]. Also notice that everybody in n\equiv 3[9] has v_3(n)=1 so nobody there can be a sum of two squares. The final word is by Prachar. Putting B= \frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2} the constant appearing above, he found

\sum_{n\leq x,n\equiv a[q]}1_{S+S}(n)\sim B_q\frac{x}{\sqrt{\log x}}


B_q=Bq^{-1}\frac{(4,q)}{(2,q)}\prod_{p\equiv 3[4],p\mid q}(1+p^{-1})

when a\equiv 1\text{ mod } (4,q);  thus the density is the same as the general density in APs, except when the modulus q is divisible by 4 (which causes a multiplication by 2 of the density) and/or it has factors congruent to 3 mod 4.

Additive quadruples in the squares

Recall that by Cauchy-Schwarz, \lvert S+S\rvert \geq \frac{\lvert S\rvert^2}{Q_4(S)} where Q_4 is the number of additive quadruples, i.e. of quadruples of square a,b,c,d \in S with a+b=c+d, which is also the number of quadruples a,b,c,d\in [\sqrt{N}]^4 with a^2+b^2=c^2+d^2. Thus, we have at least Q_4(S)\geq N/\sqrt{\log N} additive quadruples.

We can try to do better. Indeed, rewrite the equation as

(a-c)(a+c)=(d-b)(d+b),\quad -N\leq a,b,c,d\geq N

(we replaced \sqrt{N} by N here). It amounts to

uv=wz,\quad -2N\leq u,v,w,z\geq 2N,\quad u+v,w+z \text{ even}

We’ll count the positive solutions here, the other solutions are easily obtained from them. We partition such quadruples depending on the gcd d=(u,w). So u=du_1,w=dw_1,(u_1,w_1)=1. Then v=w_1e,z=u_1e for some 1\leq e\leq \min (2N/w_1,2N/u_1). So the number of add quadruples should be


By Cauchy-Schwarz, the inner sum ought to be

\sum_{u_1,w_1\leq 2N/d}1_{(u_1,w_1)=1}min(\cdots)\ll \sqrt{6/\pi^2N^2/d^2}\sqrt{\sum \min(N/u_1,N/w_1)}

Now the second sum is not much bigger than

\sum_{1\leq u_1\leq w_1\leq N/d}N/u_1=\sum N/u_1(N/d-u_1)\sim N^2/d\log N

so the total number of additive quadruples should be

\ll \sum_d 1/d^{3/2}N^2/\log N\ll N^2\log N. Returning to the previous definition, we have the inequalities

N\sqrt{\log N}\ll Q_4(S)\ll N\log N

Notice that in a random set of this density you would expect less solutions: of the order of magnitude of N. So if a,b,c, are squares, a+b-c has increased probability to be a square.

Anyway the heuristics that Q_4(A)\sim \alpha^4 N^3 for A\subset [N] of density \alpha implies that that \lvert A+A\rvert\geq N which is not the case here.

Sums of two squares in \mathbb{Z}/n\mathbb{Z}

We know that in \mathbb{Z}/p\mathbb{Z}, there are \frac{p+1}{2} squares (including 0; here p\neq 2 is a prime). This suffices to show, as a simple case of Cauchy-Davenport, that everybody is congruent mod p to a sum of two squares. This of course doesn’t extend to non-prime n, as for instance not everybody is a sum of two squares modulo 4. Let us denote by \rho_a(d)=\lvert\{ (b,c)\in [d]^2\mid a=b^2+c^2\}\rvert the number of representations of a as sum of two squares mod d. Then we can see that for d=p a prime congruent to 1 mod 4 (so that -1 is a square mod p), \rho_0(p)=2p-1, while for any other residue a\neq 0 [p], we must have \rho_0(p)=p-1. On the other hand \rho_0(p)=1 for p\equiv 3[4], and hence \rho_a(p)=p+1 for all other a.

By the Chinese Remainder theorem, we know that \rho_a(d)=\prod_{p}\rho_a(p^{v_p(d)}). But it is far from obvious to estimate the \rho(p^\alpha). In fact it happens that

\rho_{\beta}(p^{\alpha})=(1-\chi_4(p^{\alpha})\sum_{j\geq 0}1_{p^j\mid \beta}\chi_4(p^j)

where \chi_4 is the only non-trivial character mod 4, which is also Legendre symbol mod 4.