There are between 1 and N about $\sqrt{N}$ perfect squares, which makes this set $S\subset [N]$ a quite sparse set, of density $N^{-1/2}$. Given that $\lvert S\rvert^2 \sim N$, it could be reasonable to think that $S+S$, with some luck, cover all of $[N]$ or at least have positive density. In fact it’s not the case.

To estimate the size of $S+S$, notice first that it contains all the primes congruent to 1 mod 4, which is a well-known theorem of Fermat, which is about one half of the primes, so that $\lvert 2S\rvert \gg \frac{N}{2\log N}$. This already a huge lot bigger than the squares alone.

In fact there are much more sums of two squares than these primes, because all their products are also sums of two squares. In fact, it contains all the products of such primes. A similar heuristics to the one leading to the prime number theorem suggests that the density of the numbers having no prime factor congruent to 3 mod 4 should be

$\prod_{\substack{p\equiv 3[4]\\p\leq N}} (1-1/p)$, and the log of this product should be roughly

$-\sum_{\substack{p\equiv 3[4]\\p\leq N}}1/p\sim -1/2\log\log N$

so we get a density  $\frac{\lvert S+S\rvert}{N}\sim 1/\sqrt{\log N}$. While this indeed is no proof, it should be believable. In general, a number n is a sum of two squares if and only if it is of the form $n=ab^2$ where a is product of primes congruent to 1 mod 4 and b is a product of primes congruent to 3 mod 4. Thus, the total number of sums of two squares seem to be bounded above (using the union bound) by something like

$\sum'_b \lvert 2S\cap [N/b^2]\rvert$

where the $\sum'$ denotes a sum restricted over number whose prime factors are all 3 mod 4. But this should be of the order of

$\sum'_b \frac{N/b^2}{\sqrt{\log N}}\leq \sum_b1/b^2N/\sqrt{\log N}$

and the sum is bounded, so certainly $\lvert 2S\rvert\leq C\frac{N}{\sqrt{\log N}}$. In fact, Landau has proven in 1908 that the number of sums of two squares until capital N is asymptotic to

$\frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2}\frac{N}{\sqrt{\log N}}$

which fits quite well with what the heuristics indicate (the product over the restricted primes corresponds perfectly to the restricted sum).

Repartition of the sums of squares in residue classes

We see that the primes are analogous to the sums of two squares: indeed, the primes are the numbers which have no  factor, and the sums of two squares are the numbers which have no factor congruent to 3 mod 4 (plus other ones of course, the products of such numbers by 9, 49, 121 etc. of which we saw that it simply multiplied the size of the set by a constant). The same heuristic provides a density which then turns out to be correct. So why not, as Dirichlet did for the primes, ask for the repartition of the sums of squares in residue classes?

A sum of two squares $n=a^2+b^2$ can only be congruent to 0,1 or 2 mod 4, because squares are 0 or 1 mod 4. Let’s show that these values are not equiprobable in fact. First notice that $n$ can be written as $n=2^kn'$ where $n'\equiv 1 [4]$ is also a sum of two squares. So given the number $k_1$ of $n\equiv 1[4]$ until $N$ which are sums of two squares, we get $k_2=1/2 k_1$ and $k_0=(1/4+1/8+\cdots)k_1=1/2 k_1$. At least, the probabilities of being even or odd are the same. Another instance is that the sums of two squares are twice as dense in $n\equiv 1[4]$ than in $n\equiv 1[2]$. Also notice that everybody in $n\equiv 3[9]$ has $v_3(n)=1$ so nobody there can be a sum of two squares. The final word is by Prachar. Putting $B= \frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2}$ the constant appearing above, he found

$\sum_{n\leq x,n\equiv a[q]}1_{S+S}(n)\sim B_q\frac{x}{\sqrt{\log x}}$

with

$B_q=Bq^{-1}\frac{(4,q)}{(2,q)}\prod_{p\equiv 3[4],p\mid q}(1+p^{-1})$

when $a\equiv 1\text{ mod } (4,q)$;  thus the density is the same as the general density in APs, except when the modulus $q$ is divisible by 4 (which causes a multiplication by 2 of the density) and/or it has factors congruent to 3 mod 4.

Recall that by Cauchy-Schwarz, $\lvert S+S\rvert \geq \frac{\lvert S\rvert^2}{Q_4(S)}$ where $Q_4$ is the number of additive quadruples, i.e. of quadruples of square $a,b,c,d \in S$ with $a+b=c+d$, which is also the number of quadruples $a,b,c,d\in [\sqrt{N}]^4$ with $a^2+b^2=c^2+d^2$. Thus, we have at least $Q_4(S)\geq N/\sqrt{\log N}$ additive quadruples.

We can try to do better. Indeed, rewrite the equation as

$(a-c)(a+c)=(d-b)(d+b),\quad -N\leq a,b,c,d\geq N$

(we replaced $\sqrt{N}$ by N here). It amounts to

$uv=wz,\quad -2N\leq u,v,w,z\geq 2N,\quad u+v,w+z \text{ even}$

We’ll count the positive solutions here, the other solutions are easily obtained from them. We partition such quadruples depending on the gcd $d=(u,w)$. So $u=du_1,w=dw_1,(u_1,w_1)=1$. Then $v=w_1e,z=u_1e$ for some $1\leq e\leq \min (2N/w_1,2N/u_1)$. So the number of add quadruples should be

$\sum_d\sum'_{u_1,w_1}(N/u_1,N/w_1)$

By Cauchy-Schwarz, the inner sum ought to be

$\sum_{u_1,w_1\leq 2N/d}1_{(u_1,w_1)=1}min(\cdots)\ll \sqrt{6/\pi^2N^2/d^2}\sqrt{\sum \min(N/u_1,N/w_1)}$

Now the second sum is not much bigger than

$\sum_{1\leq u_1\leq w_1\leq N/d}N/u_1=\sum N/u_1(N/d-u_1)\sim N^2/d\log N$

$\ll \sum_d 1/d^{3/2}N^2/\log N\ll N^2\log N$. Returning to the previous definition, we have the inequalities

$N\sqrt{\log N}\ll Q_4(S)\ll N\log N$

Notice that in a random set of this density you would expect less solutions: of the order of magnitude of N. So if $a,b,c,$ are squares, $a+b-c$ has increased probability to be a square.

Anyway the heuristics that $Q_4(A)\sim \alpha^4 N^3$ for $A\subset [N]$ of density $\alpha$ implies that that $\lvert A+A\rvert\geq N$ which is not the case here.

Sums of two squares in $\mathbb{Z}/n\mathbb{Z}$

We know that in $\mathbb{Z}/p\mathbb{Z}$, there are $\frac{p+1}{2}$ squares (including 0; here $p\neq 2$ is a prime). This suffices to show, as a simple case of Cauchy-Davenport, that everybody is congruent mod p to a sum of two squares. This of course doesn’t extend to non-prime $n$, as for instance not everybody is a sum of two squares modulo 4. Let us denote by $\rho_a(d)=\lvert\{ (b,c)\in [d]^2\mid a=b^2+c^2\}\rvert$ the number of representations of a as sum of two squares mod d. Then we can see that for $d=p$ a prime congruent to 1 mod 4 (so that $-1$ is a square mod p), $\rho_0(p)=2p-1$, while for any other residue $a\neq 0 [p]$, we must have $\rho_0(p)=p-1$. On the other hand $\rho_0(p)=1$ for $p\equiv 3[4]$, and hence $\rho_a(p)=p+1$ for all other $a$.

By the Chinese Remainder theorem, we know that $\rho_a(d)=\prod_{p}\rho_a(p^{v_p(d)})$. But it is far from obvious to estimate the $\rho(p^\alpha)$. In fact it happens that

$\rho_{\beta}(p^{\alpha})=(1-\chi_4(p^{\alpha})\sum_{j\geq 0}1_{p^j\mid \beta}\chi_4(p^j)$

where $\chi_4$ is the only non-trivial character mod 4, which is also Legendre symbol mod 4.