When one considers the sequence of values f(n) of some reasonable arithmetic function, like the divisor function \tau, or the function \omega(n) the number of prime factors of n, one may notice that excluding a few abnormal values, the sequence looks very much like a neat, recognizable sequence. Or at least, smoothing the sequence by averaging it produces a sequence which looks neat.

Thus \omega(n) has normal order \log\log n: almost all numbers n have about \log\log n ; this is the theorem of Hardy-Ramanujan. It’s even better than this : for almost all numbers, \omega(n) is within (\log\log n)^{1/2+\epsilon} of its model \log\log n. In fact the values of \frac{\omega(n)-\log\log n}{\sqrt{\log\log n}} follow a Gaussian distribution, as the Erdös-Kac theorem reveals.

In contrary the divisor function \tau doesn’t have any normal order, but it does have an average order which is quite good-looking, namely \log n (while \log\tau has a normal order which, surprisingly isn’t \log\log n but a bit a less, namely \log 2\log\log n, showing that a few exceptionally highly divisble numbers are able to make the average deviate substantially).

Now of course on the primes \tau and \omega collapse violently, being stuck at respectively 2 and 1. The question is whether just after such a shocking value, at n=p+1, one recovers more normal arithmetic indices, whether there are traces of the past catastrophe (or signs of the coming castrophe just before it).

Of course p+1 isn’t absolutely any number, for instance it’s surely even; but then is (p+1)/2 just any number? It must also be said that it has higher chances to be mutiple of 3 than a generic number, in fact one in two chance instead of one in three, because p+1\equiv 1\text{ or } 0\text{ mod }3 whith equal probability.

From the point of view of \omega

There the answer is: yes, perfectly. Indeed, Heini Halberstam established that the Erdös-Kac theorem holds when n is constrained to range in the shifted primes p+1. That is, \omega(p+1) lies within M\sqrt{\log\log p} of \log\log p for a positive proportion p(M) of primes, the density being again gaussian.

From the point of view of \tau

Not quite. For this consider the Titchmarsh divisor problem, consisting in estimating \sum_{p\leq x} \tau(p+1). If \tau(p+1) was replaceable by \log(p+1) as we suppose, then this sum would be asymptotic to \sum_{p\leq x}\log p\sim x by Mertens formula. It turns out that it is in fact asymptotic to \frac{\zeta(2)\zeta(3)}{\zeta(6)}x, the constant prefactor being well over 1,9, so that it can be said that p+1 has almost twice as many divisors as banal numbers of his size. Now remember it’s always even; now there are good heuristic reasons to believe that \sum_{n\leq x,n\equiv [0] 2}\tau(n)\equiv 2/3x\log x, so that in average \tau(2n) is close to 4/3\log n, but 1,9 is still much larger sensibly larger than 4/3. Here the higher chances of p+1 to be divisible by 3, 5 etc. in comparison to a banal number weigh enough to make the average deviate.

It would be interesting to determine whether \log \tau(p+1) has a normal order, in the same vein as Halberstam’s result.

Another criterion of banality: the probability of being squarefree

As we know, the density of squarefree numbers is 1/\zeta(2)=\prod_{q \text{ prime}}(1-q^{-2}). It is then reasonable to wonder if among numbers of the forme n=p+1, the proportion of square free is the same. It’s clear to see that it can’t quite be so: indeed, p+1 has one on two chance of being divisible by the square 4 (while a generic number has of course one on four chance), one on six chances of being divisible by 9, one on q(q-1) chance of being divisibly by the prime q. So one guesses that p+1 is a little less likely to be squarefree than any number; indeed, it’s been proven by Mirsky that the proportion of squarefree numbers among numbers of the form p+1 is \prod_q (1-1/(q(q-1)), to compare with \prod_{q \text{ prime}}(1-q^{-2}).

The property of being squarefree can be replaced by the property of being k-free for instance, the density among shifted primes being then \prod_q (1-1/(q^{k-1}(q-1)), to compare with \prod_{q \text{ prime}}(1-q^{-k}).

Appendix: average of the divisor function on odd and even numbers

We know that

\displaystyle{x\log x\sim\sum_{n\leq x}\tau(n)=\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)+\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)=S_0(x)+S_1(x)}

And let us suppose that \sum_{n\equiv 0\mod [2],n\leq x}\tau(n)\sim Ax\log x and \sum_{n\equiv 1\mod [2],n\leq x}\tau(n)\sim Bx\log x.

Now the sum on even numbers can be decomposed into a sum on numbers of 2-adic valuation 1,2,3… But numbers of 2-adic valuation k are numbers which are of the form 2^km with m odd, so

\displaystyle{\sum_{v_2(n)=k,n\leq x}\tau(n)=(k+1)\sum_{n\leq x/2^k,n\equiv 1[2]}\tau(n)\sim(k+1)Bx/2^k\log x} for fixed k. Pretending this asymptotic also holds when k is allowed to grow with x, we infer from S_0(x)=2S_1(x/2)+(3-2)S_1(x/4)+(4-3)S_1(x/8)\cdots that C=D+D/2+D/4\cdots=2D. Hence S_0(x)\sim 2x/3\log x and S_1(x)\sim x/3\log x.

It might be that this intuition has been proven more rigorously, I’d like to see where!

The genuine prime number theorem says that there are about N/\log N prime numbers between 1 and N. The prime number theorem in function fields is rather about the number a_n of irreducible monic polynomials of degree n in the polynomial ring A=\mathbb{F}_q[t] on a finite field \mathbb{F}_q. It is remarkable that the result is of the same shape q^n/n, which is N/\log N if you let N=q^n be the number of monic polynomials of degree n. But the proof, as often with the polynomial analogues of number theory, is much neater and even yields a nice exact formula for a_n

\displaystyle{q^n=\sum_{d\mid n}da_d\text{ and by Moebius inversion } a_n=\frac{1}{n}\sum_{d\mid n}\mu(d)q^{n/d}}

First proof

This one requires a good deal of knowledge of the theory finite fields.

We prove that \Phi=T^{q^n}-T is the product \Psi of all monic irreducible polynomials whose degree divide n, which immediately provides the above formula by comparison of the degrees. To prove this, notice that the roots of \Phi are all the elements of the extension \mathbb{F}_{q^n} of \mathbb{F}_q, so that \Phi, as an element of \mathbb{F}_{q^n}[t] can be factored as \prod_{\alpha\in \mathbb{F}_{q^n}}(T-\alpha). On the other hand, if P is an irreducible polynomial of degree m\mid n, it splits into linear factors in A/(P)\sim \mathbb{F}_{q^m}\leq \mathbb{F}_{q^n}. So P has a root \alpha\in \mathbb{F}_{q^n} and all its other roots are images of \alpha\in \mathbb{F}_{q^n} under the Frobenius element, more precisely,

P=(T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})

so that P\mid \Phi (at first sight in \mathbb{F}_{q^n}[t], but also in A because they are both in A – it is well known that the gcd is preserved under field extensions). Moreover, all monic irreducible P are coprimes, so that their product still divides \Phi. Finally, we need to prove the converse divisibility, either in A or in \mathbb{F}_{q^n}[t], which is equivalent. So let’s do it in the latter; take a factor T-\alpha with \alpha \in\mathbb{F}_{q^n}. Let m\mid n be the order of the Frobenius element on \alpha, so that \alpha^{q^m}=1 and no smaller integer has this property; then (T-\alpha)\mid (T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})\in A which is indeed an irreducible of degree m\mid n, hence a factor of \Psi. Given the factorization of \Phi and the fact that the factors T-\alpha are coprime, their product \Phi still divides \Psi

Second proof

This one proves exactly what we want and not more and uses only power series. We need the notation \vert f\vert=q^{\text{deg }f}.

Let

\displaystyle{\zeta_A(s)=\prod_{P\in A\text{ monic irreducible}}(1-\vert P\vert ^{-s})^{-1}=\sum_{f\in A\text{ monic}}\vert f\vert^{-s}}

be the zeta function (for s of real part >1). It is easy to see that \zeta_A=(1-q^{1-s})^{-1} by looking at the sum expression, and \zeta_A=\prod_{d\geq 1}(1-q^{-ds})^{-a_d} by looking at the product expression. Putting u=q^{-s}, both sides become nice power series so that we get the equality

\displaystyle{\frac{1}{1-qu}=\prod_{d=1}^{+\infty}(1-u^d)^{-a_d}}

and doing classical operations (logarithmic derivation and expansion in power series followed by termwise identification in particular) on both sides yield the result.

This proof seems to use about zero input of algebra, so its success looks like a miracle. The only step where a (tiny) bit of knowledge seems to be used is the transformation of the sum into an Euler product – it needs the existence of unicity of the decomposition of a monic polynomial into product of irreducibles.

A kind of sieve method ?

It could be reasonable to hope that the sieve of Eratosthenes would work better in the realm of polynomial rings. We’ll try to adapt it here. Notice that in this case we have count primes up to a certain large degree, and not of exactly certain large degree (both quantities are far from equivalent in fact !).

Let’s define \pi(n) to be the number of monic irreducible (let’s say primes) polynomials of degree at most n, and \delta(m)=1_{m=1}. We also use Moebius’ function for monic polynomials, also defined in terms of the number of prime factors. The Moebius inversion formula \delta=\mu\star 1 still holds in the polynomial world. We also write

\displaystyle{\Phi=\prod_{P \text{mon. irred,deg} P\leq n/2}P}

Thus

\displaystyle{\pi(n)-\pi(n/2)+1=\sum_{f\text{ monic of deg}\leq d}\delta((f,\Phi))=\sum_{d\mid\Phi}q^{n-\mathrm{deg}d}\mu(d)1_{\mathrm{deg} d\leq n}}

and if we forget about 1_{\mathrm{deg} d\leq n} we obtain exactly

\displaystyle{q^n\prod_{P \text{mon. irred,deg} P\leq n/2}(1-\vert P\vert ^{-1})}

this looks quite nice, but the error term, just like in the integer case, can’t easily be bounded by anything better than O(2^{\pi(n/2)}) which we know will be vastly bigger than the main term. By the way this product can very easily be shown to be \ll 1/n but the exact asymptotic is not any easier than the PNT itself… So this sieve seems to have very little interest, given the convenience of the other methods.

Two posts in french have already appeared on this topic, showing that the maximal number of unit vectors having pairwise negative (resp. non-positive) inner products grow linearly wit the dimension of the space \mathbb{R}^n they live in. This is good fun; however, as mentioned in a comment, it becomes even better fun when one bothers asking, for \epsilon >0 how big a family of unit vectors v_1,\cdots,v_k\in \mathbb{R}^n can be if…

  1. … for all i\neq j, one requires v_i\cdot v_j<\epsilon;
  2. … for all i\neq j, one requires v_i\cdot v_j<-\epsilon;

Let’s deal with the first question. This involves the remarkable fact that on the sphere of very large dimension, almost all the area is concentrated on a small band near the equator (or any equator). This is the phenomenon called concentration of measure ; so pick a vector v_1\in\mathbb{S}^n. To continue your family, you have to pick a vector outside the spherical ball B=\{x\mid x\cdot v_1\geq\epsilon\}. So the concentration of measure implies that the forbidden area is of order C\exp(-cn\epsilon^2). When you have picked already k vectors, the forbidden area for the choice of the next one is at most kC\exp(-cn\epsilon^2) which is strictly less than 1 until k=C^{-1}\exp(cn\epsilon^2). Then not all the sphere is forbidden, so you can accommodate certainly C^{-1}\exp(cn\epsilon^2): exponentially (in the dimension) many instead of linearly.

The second one is easier. Indeed,

0\leq \lVert \sum_{i=1}^k v_i \rVert^2=k+2\sum_{i<j}v_i\cdot v_j\leq k-\epsilon k(k-1)

which implies k\leq \epsilon^{-1}+1 so we get a constant bound, certainly much lower than n+1 for large dimensions.

I devised yesterday a false calculation of the area of the sphere S^2=\{(x,y,z)\mid x^2+y^2+x^2=1\}. The idea is quite natural. Decompose  the sphere in circles \{(x,y)\mid x^2+z^2=1-y^2\} of radius \sqrt{1-z^2}. Thus it seems that the area of a sphere should be 2\pi\int_{-1}^1\sqrt{1-z^2}dz. However, by the change of variable z=\sin x, we easily notice that \int_{-1}^1\sqrt{1-z^2}dz=\int_{-\pi/2}^{\pi/2}\cos^2xdx=\pi/2, so we get \pi^2, instead of 4\pi, the well known value we should find. What’s wrong?

Well why should it be true on the first place? The intuitive reason why this should be true consists in approximating our sphere by a ziggurat of cylinders, of height 1/n and radius \sqrt{1-(i/n)^2}, thus getting something like \frac{2\pi}{n}\sum_{i=-n}^{n-1}\sqrt{1-(i/n)^2} for the area of our tower. This Riemann sum does converge to 2\pi\int_{-1}^1\sqrt{1-z^2}dz, but there’s no reason why it should be the area of the sphere. There’s no reason why it should be the area of anything definite either… Notice by cutting this picture by a plane, it amounts to approximating a (quarter of a) circle by a bunch of vertical segments… whose sum of lengths is of course constantly equal to 1, instead of converging to \pi/2.

In the same vein, take a half circle based on the segment [-1,1],  and then two smaller half circles under it, and split again each arc into two arcs. This yields a sequence of curves \gamma_n :[-1,1]\rightarrow \mathbb{R}^2 which converge in the L^{\infty} norm to the straight line segment, but which have a constant length \pi which doesn’t converge to the length of the limit curve. Indeed, the length of a curve is not a continuous function, only semi-continous.

So we have to think hard about the notion of surface area of a (say convex) body in the space. Certainly the right way to proceed is via Minkowski content (in other words in this case: derivative of the volume of the sphere) or Hausdorff measure.

In the Green-Tao method appear quite often expressions such as \mathbb{E}\prod_{i}1_{d_i\mid\psi_i(n)} for some system of affine linear forms \psi_i:\mathbb{Z}^d\rightarrow\mathbb{Z} where the d_i are some positive integers. The most basic example would be \mathbb{E}_{n\leq N}1_{d\mid n}=\frac{1}{N}\lfloor \frac{N}{d}\rfloor=\frac{1}{d}+O(1/N)=\mathbb{E}_{n\in\mathbb{Z}/d\mathbb{Z}}1_{n=0}+O(1/N) where the last equality is here only to insist heavily on the form a general statement will have to take. A bit more generally, one easily sees that if f:\mathbb{Z}\rightarrow \mathbb{Z} is d-periodic, then its average until N is basically its average until d+O(d/N). Following the ideas from Green-Tao, in fact we can show that for our general problem, the asymptotic should be analogously \mathbb{E}_{n\in (\mathbb{Z}/m\mathbb{Z})^d}1_{\forall i,d_i\mid\psi_i(n)} where m is the gcd of the d’s.

Thus, applying this method, one has frequently to deal with such « local densities », the intersection of the zero sets of affine forms. More generally again, one may have to deal with things like \alpha_P(p^2):=\mathbb{P}_{n\in(\mathbb{Z}/p^2\mathbb{Z})^d}(P(n)=0) for some polynomial P\in\mathbb{Z}[X_1,\dots,X_d]. Say P=\prod_{i=1}^t\phi_i where the \phi_i are affine linear forms, so that P is of degree t. Suppose also that no two of them are affinely dependant, even when reduced modulo p. Then to compute \alpha_P(p^2), notice that in order for n to satisfy P(n)\equiv 0[p^2], either two of the forms must be divisible by p in n, or one must be divisible by p^2 there.

To analyse the first possibility, reduce the forms modulo p, so they are as seen affine forms on the field \mathbb{F}_p. They are non-zero form, so the image of each linear part is of dimension 1 and the kernels of dimension d-1; so the zero locus of each of our affine forms is a hyperplane, and the hypothesis of no affine dependance ensures that the intersection of any two of these hyperplanes is of dimension d-2. So there are only p^{d-2} common zero for each pair of forms. But each point of \mathbb{F}_p^d gives rise to p^d antecedents in (\mathbb{Z}/p^2\mathbb{Z})^d under the map of reduction modulo p. Finally, this possibility provides a summand of at most \binom{t}{2}p^{-2} to \alpha_P(p^2).

To compute now \mathbb{P}_{n\in (\mathbb{Z}/p^2\mathbb{Z})^d}(\phi_i(n)=0), notice that any such 0 must arise from a 0 mod p. Now there are p^{d-1} 0 mod p, because the set of zeros is a hyperplane. Then we can use a form of Hensel’s lemma in several variables. To obtain such a statement, we rely on the following simple identy for any Q\in\mathbb{Z}[X_1,\dots,X_d]:

Q(x+p\overrightarrow{k})\equiv Q(x)+p\,\overrightarrow{\mathrm{grad}}(Q)\cdot \overrightarrow{k}\text{ mod }p^2

which shows that if Q(x)=0\text{ mod }p, so Q(x)=mp\text{ mod }p^2 and latex \text{grad}(Q)\neq 0\text{ mod }p$ then the vector \overrightarrow{k} is constrained to live in a hyperplane if Q(x+p\overrightarrow{k}) is to vanish. So there are only p{d-1} elements y\in\mathbb{Z}/p^2\mathbb{Z} with x\equiv y [p] and Q(y) \equiv 0[p^2]. Of course if Q is affine then the gradient is simply the linear part. Thus the number of roots modulo p^2 is p^{2d-2}. By the union bound, we see that:

\mathbb{P}_{n\in (\mathbb{Z}/p^2\mathbb{Z})^d}(\exists i:\phi_i(n)=0)\leq tp^{-2}. In the end, \alpha_P(p^2)\ll_t p^{-2}.

To find more about this calculation and similar ones and the reason why I am interested in them, look at this article here.

 

What I claim is that, letting p_n being the nth prime

\frac{\lvert\{p_n\leq N\mid p_{n+1}-p_n\leq \log^{1-\epsilon} N\}\rvert}{N/\log N}\rightarrow 0

in other words the proportion of the gaps between primes up to N which are smaller than \log ^{1-\epsilon} N is tending to 0. Thus although we know that there are infinitely many bounded gaps, we see here that they are almost all of size at least logarithmic.

To prove it, we recall a classical application of Selberg’s sieve:

\pi_m(x):=\lvert\{p \leq x\mid p+m\text{ is also a prime}\}\rvert \ll \frac{x}{\log^2 x}\prod_{p\mid m}(1+1/p)

where the implied constant is absolute (does not depend on m).

Now this product is obviously less than than \prod_{p\leq \omega(N)}(1+1/p) which by Mertens’s theorem and the obvious bound \omega (n)\ll \log n implies that is at most \log\log N. But now the numerator of our fraction is obviously

\displaystyle{\sum_{m\leq \log^{1-\epsilon} N}\pi_m(N)\ll\frac{N}{\log^2 N} \sum_{m\leq \log^{1-\epsilon} N} \log\log m\ll \frac{N}{\log^2 N}\log^{1-\epsilon/2}N}

The last term is simply \frac{N}{\log^{1+\epsilon/2}N} which is indeed negligible compared to the number of primes until N, whence the result.

Notice that we could have argued a bit differently: indeed, \sum_{m\leq M}\prod_{p\mid m}(1+1/p)=\sum_{d}[M/d]\frac{\mu(d)^2}{d}\ll M (instead of M\log\log M previously). Thus, the number of primes until N such that the next prime is at distance at most M is bounded by \frac{N}{(\log N)^2}M, which is o(1) as soon as M=o(\log N).

As a conclusion, gaps of size p_{n+1}-p_n=o(\log p_n) have density 0.

There are between 1 and N about \sqrt{N} perfect squares, which makes this set S\subset [N] a quite sparse set, of density N^{-1/2}. Given that \lvert S\rvert^2 \sim N, it could be reasonable to think that S+S, with some luck, cover all of [N] or at least have positive density. In fact it’s not the case.

To estimate the size of S+S, notice first that it contains all the primes congruent to 1 mod 4, which is a well-known theorem of Fermat, which is about one half of the primes, so that \lvert 2S\rvert \gg \frac{N}{2\log N}. This already a huge lot bigger than the squares alone.

In fact there are much more sums of two squares than these primes, because all their products are also sums of two squares. In fact, it contains all the products of such primes. A similar heuristics to the one leading to the prime number theorem suggests that the density of the numbers having no prime factor congruent to 3 mod 4 should be

\prod_{\substack{p\equiv 3[4]\\p\leq N}} (1-1/p), and the log of this product should be roughly

-\sum_{\substack{p\equiv 3[4]\\p\leq N}}1/p\sim -1/2\log\log N

so we get a density  \frac{\lvert S+S\rvert}{N}\sim 1/\sqrt{\log N}. While this indeed is no proof, it should be believable. In general, a number n is a sum of two squares if and only if it is of the form n=ab^2 where a is product of primes congruent to 1 mod 4 and b is a product of primes congruent to 3 mod 4. Thus, the total number of sums of two squares seem to be bounded above (using the union bound) by something like

\sum'_b \lvert 2S\cap [N/b^2]\rvert

where the \sum' denotes a sum restricted over number whose prime factors are all 3 mod 4. But this should be of the order of

\sum'_b \frac{N/b^2}{\sqrt{\log N}}\leq \sum_b1/b^2N/\sqrt{\log N}

and the sum is bounded, so certainly \lvert 2S\rvert\leq C\frac{N}{\sqrt{\log N}}. In fact, Landau has proven in 1908 that the number of sums of two squares until capital N is asymptotic to

\frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2}\frac{N}{\sqrt{\log N}}

which fits quite well with what the heuristics indicate (the product over the restricted primes corresponds perfectly to the restricted sum).

Repartition of the sums of squares in residue classes

We see that the primes are analogous to the sums of two squares: indeed, the primes are the numbers which have no  factor, and the sums of two squares are the numbers which have no factor congruent to 3 mod 4 (plus other ones of course, the products of such numbers by 9, 49, 121 etc. of which we saw that it simply multiplied the size of the set by a constant). The same heuristic provides a density which then turns out to be correct. So why not, as Dirichlet did for the primes, ask for the repartition of the sums of squares in residue classes?

A sum of two squares n=a^2+b^2 can only be congruent to 0,1 or 2 mod 4, because squares are 0 or 1 mod 4. Let’s show that these values are not equiprobable in fact. First notice that n can be written as n=2^kn' where n'\equiv 1 [4] is also a sum of two squares. So given the number k_1 of n\equiv 1[4] until N which are sums of two squares, we get k_2=1/2 k_1 and k_0=(1/4+1/8+\cdots)k_1=1/2 k_1. At least, the probabilities of being even or odd are the same. Another instance is that the sums of two squares are twice as dense in n\equiv 1[4] than in n\equiv 1[2]. Also notice that everybody in n\equiv 3[9] has v_3(n)=1 so nobody there can be a sum of two squares. The final word is by Prachar. Putting B= \frac{1}{\sqrt{2}}\prod_{p\equiv 3[4]}(1-p^{-2})^{-1/2} the constant appearing above, he found

\sum_{n\leq x,n\equiv a[q]}1_{S+S}(n)\sim B_q\frac{x}{\sqrt{\log x}}

with

B_q=Bq^{-1}\frac{(4,q)}{(2,q)}\prod_{p\equiv 3[4],p\mid q}(1+p^{-1})

when a\equiv 1\text{ mod } (4,q);  thus the density is the same as the general density in APs, except when the modulus q is divisible by 4 (which causes a multiplication by 2 of the density) and/or it has factors congruent to 3 mod 4.

Additive quadruples in the squares

Recall that by Cauchy-Schwarz, \lvert S+S\rvert \geq \frac{\lvert S\rvert^2}{Q_4(S)} where Q_4 is the number of additive quadruples, i.e. of quadruples of square a,b,c,d \in S with a+b=c+d, which is also the number of quadruples a,b,c,d\in [\sqrt{N}]^4 with a^2+b^2=c^2+d^2. Thus, we have at least Q_4(S)\geq N/\sqrt{\log N} additive quadruples.

We can try to do better. Indeed, rewrite the equation as

(a-c)(a+c)=(d-b)(d+b),\quad -N\leq a,b,c,d\geq N

(we replaced \sqrt{N} by N here). It amounts to

uv=wz,\quad -2N\leq u,v,w,z\geq 2N,\quad u+v,w+z \text{ even}

We’ll count the positive solutions here, the other solutions are easily obtained from them. We partition such quadruples depending on the gcd d=(u,w). So u=du_1,w=dw_1,(u_1,w_1)=1. Then v=w_1e,z=u_1e for some 1\leq e\leq \min (2N/w_1,2N/u_1). So the number of add quadruples should be

\sum_d\sum'_{u_1,w_1}(N/u_1,N/w_1)

By Cauchy-Schwarz, the inner sum ought to be

\sum_{u_1,w_1\leq 2N/d}1_{(u_1,w_1)=1}min(\cdots)\ll \sqrt{6/\pi^2N^2/d^2}\sqrt{\sum \min(N/u_1,N/w_1)}

Now the second sum is not much bigger than

\sum_{1\leq u_1\leq w_1\leq N/d}N/u_1=\sum N/u_1(N/d-u_1)\sim N^2/d\log N

so the total number of additive quadruples should be

\ll \sum_d 1/d^{3/2}N^2/\log N\ll N^2\log N. Returning to the previous definition, we have the inequalities

N\sqrt{\log N}\ll Q_4(S)\ll N\log N

Notice that in a random set of this density you would expect less solutions: of the order of magnitude of N. So if a,b,c, are squares, a+b-c has increased probability to be a square.

Anyway the heuristics that Q_4(A)\sim \alpha^4 N^3 for A\subset [N] of density \alpha implies that that \lvert A+A\rvert\geq N which is not the case here.

Sums of two squares in \mathbb{Z}/n\mathbb{Z}

We know that in \mathbb{Z}/p\mathbb{Z}, there are \frac{p+1}{2} squares (including 0; here p\neq 2 is a prime). This suffices to show, as a simple case of Cauchy-Davenport, that everybody is congruent mod p to a sum of two squares. This of course doesn’t extend to non-prime n, as for instance not everybody is a sum of two squares modulo 4. Let us denote by \rho_a(d)=\lvert\{ (b,c)\in [d]^2\mid a=b^2+c^2\}\rvert the number of representations of a as sum of two squares mod d. Then we can see that for d=p a prime congruent to 1 mod 4 (so that -1 is a square mod p), \rho_0(p)=2p-1, while for any other residue a\neq 0 [p], we must have \rho_0(p)=p-1. On the other hand \rho_0(p)=1 for p\equiv 3[4], and hence \rho_a(p)=p+1 for all other a.

By the Chinese Remainder theorem, we know that \rho_a(d)=\prod_{p}\rho_a(p^{v_p(d)}). But it is far from obvious to estimate the \rho(p^\alpha). In fact it happens that

\rho_{\beta}(p^{\alpha})=(1-\chi_4(p^{\alpha})\sum_{j\geq 0}1_{p^j\mid \beta}\chi_4(p^j)

where \chi_4 is the only non-trivial character mod 4, which is also Legendre symbol mod 4.

This post is an echo to Davide Ravotti’s brilliant talk one week ago.

Equidistribution has already been discussed here on this blog (during the good old epoch of francophony…) and it is a fascinating property. It must be observed that it isn’t the property of a countable set but of a rational sequence; in other words it depends crucially on the order in which the terms are considered.

The rationals between 0 and 1 are countable for instance. The usual way to see it is, as Cantor did, to naturally embed \mathbb{Q} in \mathbb{Z}^2 and to order the couples of positive integers (p,q) by increasing p+q. But number theorists know that the right way to order rationals is by increasing denominator. Indeed, in diophantine approximation, the denominator determines the quality of the approximation. The fractions p/q, at fixed q, follow each other at distance 1/q so these produce sequences which are of course evenly distributed and denser and denser. So let’s introduce the Farey sequence which is the most relevant ordering of the rationals.

The Farey sequence

We denote \beta_{q,i} for 1\leq q , 1\leq i\leq \phi(q) the fractions a_i(q)/q, where 1=a_1(q) < a_2(q) < a_{\phi(q)}=q-1 are the integers coprime to and smaller than q. This yields an enumeration of the rationals: 1,1/2,1/3,2/3,1/4,3/4,1/5,2/5,3/5,4/5,1/6… In other words, our sequence is defined by \alpha_k=\beta_{q,i} where q is the only integer such that \sum_{n=1}^{q-1}\phi (n) < k\leq \sum_{n=1}^q\phi (n) and i=k-\sum_{n=1}^{q-1}\phi (n). We may want to add \alpha_0=0. The Farey sequence of order n is \mathcal{F}_n=(\alpha_k)_{1\leq k\leq \sum_{q\leq n}\phi (k)}. Every rational appears in \mathcal{F} because every rational can be put in lowest terms p/q with (p,q)=1.

The Farey sequence of order n yields a dissection of the unit interval (or rather of the circle if you want to identify both ends, which we do in the end if we use Weyl’s characterisation of equidistribution) in the following way: we look at all the mediants \frac{h+h'}{k+k'} of successive pairs. The intervals between two mediants will be our Farey arcs and each fraction of the Farey sequence will be contained in one arc. For instance 0=1=1/1 is between the points 1-1/n and 1/n so that the mediants formed with them and 1/1 are \frac{n}{n+1} and \frac{1}{n+1}, hence the arc containing 0 is (\frac{n}{n+1},\frac{1}{n+1}). It is easy to see that the arc containing h/k\in \mathcal{F}_n  is made of two parts (one at each side of the Farey fraction conatined in it) both of length between \frac{1}{k(2n-1)} and \frac{1}{k(n+1)}. This denotes a certain uniformity. It also provides again another proof of Liouville’s theorem by the way (see Hardy & Wright).

Neville indeed proved in 1949 that the Farey sequence \mathcal{F} satisfies Weyl’s criterion and hence is indeed equidistributed. In fact the equidistribution remains true however we order the people inside each \beta_q sequence.

Connection to Riemann’s hypothesis

Jérôme Franel observed in 1924 that Riemann’s hypothesis would imply (and is equivalent to the fact) that on average the i-th fraction a_{i,n} of \mathcal{F}_n is very close to i/m_n where m_n=\sum_{q\leq n}\phi(q). So the Farey sequence is quite close to the evenly spaced sequence of fractions of the same length. Precisely, putting d_{i,n}=a_{i,n}-i/m_n, Riemann’s hypothesis is equivalent to \sum d_{i,n}^2=O(n^{-1+\epsilon}) for all \epsilon >0.

Other sequences of rationals and generalisations to higher dimensions

Ordering the elements \alpha=1/q(p_1,\cdots,p_n) \in \mathbb{Q}^n\cap [0,1]^n amounts to ordering the elements \alpha=(p_1,\cdots, p_n,q)\in\mathbb{Z}^{n+1}\cap \text{Cone}([0,1]^n\times\{1\}) (in both case we want to restrict to (p_1,\cdots, p_n,q)=1 which we call primitive tuples). Here an « optical » description is useful, as strikingly it is already in the theory of Farey sequences to prove that if two elements p/q,p'/q'\in\mathcal{F}_n are consecutive, then p'q-pq'=1.

Farey8

As the picture of the Farey sequence of order 8 above shows, one can imagine an eye looking from the origin into to the first quadrant. He only sees the first integer point on each line; hence 2/8 is hidden from its sight by 1/4, its primitive form. Two fractions p/q,p'/q' are successive if and only if there’s no point in the parallelogram between both vectors, which happens only if it has area 1, which implies p'q-pq'=1.

To order the elements, we use a piecewise linear function u : \mathbb{R}^{n+1}\rightarrow \mathbb{R} with integer coefficients. Thus the Farey case is u(x_1,\cdots,x_{n+1})=x_{n+1} and the « Cantor » case is u(x_1,\cdots,x_{n+1})=\sum x_i. Given such a function, we order the integral points according to the value of the function on them; in order words we partition the integral points of the cone into level sets of u. So we need that each of this level set have a compact intersection with the cone, so that we have a finite number of integral points on each level set…

The cone of [0,1]x{1}, and some level sets of some piecewise linear functions

The cone of [0,1]x{1}, and some level sets of some piecewise linear functions

Goal: given a simplex S\subset [0,1]^n, compute

\lim \frac{\lvert \{k\leq N\mid \alpha_k\in \text{Cone} (S)\}\rvert}{N}

If this is proportional to the volume of S, this means that we have equidistribution of the sequence. Indeed it’s the generalisation of case n=1, where simplices are simply intervals.

From now, we restrict without loss of generality to a linear u. The key step is to compute

P_{S_1(t)}=\lvert\{a\in\mathbb{Z}^{n+1}\cap tS_1, a\text{ primitive}\}\rvert

for t integer, where S_t=\{u=t\}\cap \text{Cone}(S). We can in fact drop the primitivity condition, and retrieve the cardinality searched by Moebius inversion.

Ehrhart’s theory

This is precisely the scope of Ehrhart theory, dealing with things such as L_P(t)=\lvert\{a\in\mathbb{Z}^{n}\cap tP\}\rvert for P\subset\mathbb{R}^n a rational polyhedron (=with rational vertices).

Notice that Eugène Ehrhart was an alsacian high-school teacher who didn’t complete his PhD before his 60th birthday…

Examples

  • P=[0,1]^n then L_p(t)=(t+1)^n
  • if P is the standard simplex then a classical counting argument yields L_p(t)=\binom{t+n}{t}
  • if P is a convex polygon of the plane with integral vertices then Pick’s formula says that L_P(t)=\text{Area}(P)t^2+\frac{\lvert\{\text{integral points in } \partial P\}\rvert}{2}t+1

This looks polynomial and it isn’t a coincidence.

Ehrhart’s theorems:

  1. If P\subset\mathbb{R}^n is integral polyhedron of dimension d, then L_P(t) is a rational polynomial of degree d, whose dominating coefficient is the relative volume \nu(P) of the polyhedron.
  2. If P\subset\mathbb{R}^n is integral polyhedron of dimension d, then L_P(t) is a rational quasi-polynomial of degree d, i.e. there exist functions c_i : \mathbb{Z}\rightarrow\mathbb{Q} which are periodic, of periods divising the lcm of the denominators of the vertices such that L_P(t)=\sum c_i(t)t^d.

The second statement comprises the first. The relative volume \nu(P) is defined as follows: there exists an affine isomorphism \psi : \text{Aff}(P)\rightarrow \mathbb{R^d} such that \psi(\mathbb{Z}^{n+1}\cap \text{Aff}(P))=\mathbb{Z}^d and then \nu (P):=\text{Vol} (\psi(P)), which doesn’t depend on the choice of \psi, because two possible choices differ by a transformation of determinant 1, hence give same volume.

Notice that for a general convex set C\subset\mathbb{R}^n of non-zero volume, the arguments from Tao-Vu’s book Additive combinatorics or Green-Tao’s Linear equations in primes (appendix A) give that L_C(t)=\text{Vol}(C)t^{n}+O(t^n). It is for instance quite well known (Gauss circle problem) how many integral points there are asymptotically in a big disc of Radius R: about \pi R^2.

Finally, performing the Moebius inversion:

P_{S_1}(t)=\sum_{s\mid t}\mu(t/s)L_{S_1}(s)=\sum \mu(t/s)(\nu(S_1)s^d+O(s^{d-1}))

and finally P_{S_1}(t) =\nu(S_1) Ct^{d+1}+O(t^d) where Ct^{d+1}=\sum_{s\mid t}\mu(t/s)s^d according to some number theory (cf the exercises of Murty).

In the end of the day, we get that \lim \frac{\lvert \{k\leq N\mid \alpha_k\in \text{Cone} (S)\}\rvert}{N}=\frac{\nu (S_1)}{\nu ([0,1]^n)}. To be quite honest, it isn’t exactly the case when S is not integral but simply rational. Indeed, the leading coefficient c_d(t) in Ehrhart’s (quasi)-polynomial of a polyhedron P is not always constantly equal to the relative volume \nu(p). To see this, take S=[0,1]\times \{1/2\}\subset\mathbb{Z}^2. This is a nice rational polyhedron of dimension 1 and relative volume 1 (morally it’s the volume when seen in its « natural environment », its affine hull). Then tS contains an integral point if and only if t is even, so L_P(t)=1_{t\equiv 0[2]}(t+1). So instead of \nu (S_1), we should get here \nu(bS_1)/b^{d+1} where b is the first integer so that bS_1 contains an integral point.

When do we get equirepartition? Probably essentially when \nu(S_1)=\nu(S) for all simplices S. This is certainly only the case when S_1=S all the time, which seems to imply that u=x_{n+1}, in other words the Farey case.

Given t\in\mathbb{R}, we would like to find solutions, probably rather approximate solutions, to the diophantine equation pt-q=0 (so the unknown p,q are constrained to live in \mathbb{Z} ; we may also require them to be coprime, and the denominator shall always be positive). This equation has an exact integral solution if and only if t is rational. Finding an approximate solution amounts to finding a fraction p/q approximating t. We want to define a notion of best approximation. Of course there are rationals arbitrarily close to t, but their denominators are then unbounded; so we fix an amount we are ready to pay, or rather a bound to the denominators, say Q, and we restrict our attention to fractions p/q with q\leq Q. We will say that p/q with q\leq Q is a best approximation (of first kind) of order Q if \vert pt-q\vert < \vert at -b\vert for all a/b,b\leq Q. The second kind would be to demand \vert t-p/q\vert to be minimal; it happens that both awards aren't always awarded to the same candidate.

How well can a number be approximated ?
Given a bound Q on the denominators, how close can be fractions with denominator q\leq Q to t ? Let \Vert x\Vert \in [0,1/2] be the distance to the nearest integer, so we are looking at \Vert qt\Vert. A simple application of the pigeonhole principle ensures that there exists q\leq Q with \Vert qt \Vert\leq 1/Q, so in particular \Vert qt \Vert \leq 1/q and there exists a sequence of fractions p/q with \vert t-p/q\vert\leq 1/q^2. Note that the result is not constructive. This exponent 2 is nice, but can we do better? Can we be sure to find an unbounded sequence of q with \Vert qt\Vert \leq 1/q^2 for instance?
The answer is no: Liouville proved that if t is an irrational algebraic number of degree at most d (so that there exists a unitary rational polynomial of degree d\geq 2 such that P(t)=0, then there exists a constant C(t) such that for any p,q, \vert t-p/q\vert\geq C(t)/q^d. Thus for quadratic irrationals (irrationals which are algebraic of degree 2), the exponent 2 is really the best possible. But the discussion concerning the optimal constant C is still open.
So if you define \delta(t) =\sup   \{m\mid\exists\text{ a sequence }p/q\text{ with }q\rightarrow \infty : \vert t-p/q\vert\leq 1/q^m\} then \delta\leq d for algebraic irrationals of degree d.

This theorem seems paradoxical because it states that the irrationals which seem to be the least irrational, the algebraic ones of low degree, are actually the most difficult to approximate. The paradox vanishes (or at least fades) when one remarks that the rationals are actually the most difficult to approximate: indeed, if t=a/b is in lowest terms, and if p/q\neq t is a rational in lowest terms with q\leq Q, then \vert p/q-t\vert\geq \frac{1}{bq}\geq\frac{1}{bQ} so we get an exponent 1. Notice the trick that a non-zero integer is at least 1, which is also together with the intermediate value theorem, the trick to prove Liouville’s theorem. Liouville’s theorem is also the easiest gateway to exhibiting transcendental numbers; indeed Liouville found numbers with arbitrarily high \delta.

In fact, as far as the exponent is concerned, algebraic irrationals of high degree are not any easier to approximate than quadratic irrationals, as has been revealed by the theorem of Thue-Siegel-Roth, saying that for an algebraic irrational t and any \epsilon>0, there exists a constant C(t,\epsilon) such that for all integers p,q, \vert t-p/q\vert\leq \frac{C(t,\epsilon)}{q^{2+\epsilon}}.

The best exponent is 2 in general, but what about the constant ?
We can always find a sequence p/q with \vert t-p/q\vert\leq 1/q^2, and for quadratic irrationals, \vert t-p/q\vert\geq C/q^2 so between C and 1, what is the optimal constant? The truth, as Hurwitz’s theorem says, is that \vert t-p/q\vert < \frac{1}{\sqrt{5}q^2} for infinitely many p,q. And for the golden ratio and its conjugates, \frac{1}{\sqrt{5}} is optimal; for all other numbers, one can do better, certainly arbitrarily better (arbitrarily big constants) for algebraic irrationals of degree more than 2, and merely a better constant for other quadratic irrationals. We’ll come back to this later.

How to construct good approximations (exponent 2)?
The existence of a sequence p/q with \vert t-p/q\vert\leq 1/q^2 is always granted but how to construct one? If you naively look at the decimal expansion of your number, you only get an approximation of quality 1/q.
So the correct way to proceed is to use continuous fraction expansion. So take t\in [0,1], and look at the map T(x)=\{\frac{1}{x}\}. Put a_1=\lfloor \frac{1}{x}\rfloor and a_k=\lfloor \frac{1}{T^k(x)}\rfloor. Then
x=\frac{1}{a_1+\frac{1}{a_2+\cdots}}=:[a_1,a_2,\cdots...]. If you write A_k/B_k=[a_1,\cdots,a_k] the kth convergent, then you can prove that \vert B_kt-A_k\vert<1/B_{k+1}<1/B_k, which is the quadratic approximation we desired.

Best approximation
Not only do continuous fractions yield a natural way towards quadratic approximation, they also provide the « best approximations » in the following sense. We fix a number t (let’s say for convenience that it’s irrational); we say a fraction a/b is a best approximation if every other fraction p/q\neq a/b with 0<q\leq b satisfies \vert bt-a\vert< \vert pt-q\vert.

Then every best approximation is a convergent A_k/B_k=[a_0;a_1,\cdots,a_k] and any convergent is a best approximation

Which numbers are the toughest to approximate ?
We have seen that for every number t, there exists a convergent sequence p/q with \vert t-p/q\vert\leq \frac{1}{\sqrt{5}q^2}. The only numbers for which \sqrt{5} is optimal are the golden ratio g and his conjugate, solution of the quadratic equation x^2-x-1=0. We know that g\in (1,2) and g=1+1/g, so a_0=\lfloor g\rfloor=1 and by injecting the description of g in the right-hand side, we get g=1+\frac{1}{1+1/g}=1+\frac{1}{1+\frac{1}{1+1/g}}=\cdots so the convergents are plainly [1,\cdots,1]. In particular it is periodic.
We could try to characterize the numbers for which the a_k (called the partial quotients) are periodic. The same question is easy to answer for the other pleasant type of rational approximations, the decimal approximations: for which numbers is the sequence of digits eventually periodic, i.e. t=a_0,a_1a_2\cdots a_k\overline{a_{k+1}\cdots a_m} where the bar means that the sequence is repeated ad vitam? The answer is very easy: it is the rationals. So for which numbers do we have t=[a_0;a_1,a_2,\cdots a_k,\overline{a_{k+1},\cdots ,a_m}]? Lagrange proved that these were exactly the quadratic irrationals.

Now from Liouville’s theorem, and Thue-Siegel-Roth, we suspect that the toughest numbers to approximate are the quadratic irrationals, so the numbers which have periodic partial quotients. The answer is slightly more general: it is the numbers which have bounded partial quotients. Indeed, we can formulate the following alternative:
For any irrational number t and integer n, either there are infinitely many positive b such that b\Vert bt\Vert<\frac{1}{\sqrt{n^2+4}} or a_k<n for all sufficiently large indices k. This is a generalization of Hurwitz' theorem.

Other neighbouring questions
What we have just seen is a characterization of the real numbers t such that \inf n\Vert nt\Vert>0: they are the real numbers with bounded partial quotients a_k. Borel has shown that this set has Lebesgue measure 0. But of course they do exist, the golden ratio for instance. Now what if you take a second real number s and ask for \liminf n\Vert nt\Vert\Vert ns\Vert: will you always find 0? Littlewood’s conjecture is that it is the case.

Happy new year to all readers! This year I hope will be full of mathematical reflexions here, alas certainly again in English, in order to reach new audience (I’ve just noticed I’m referenced on Terry Tao’s weblog for instance).

I have already discussed what this same Tao calls the dichotomy principle (« a massive lot or very few ») for polynomials. It was essentially the case of polynomials in two variables, as in the traditional Bezout’s theorem: two coprime polynomials P,Q\in  \mathbb{R}[X,Y] of degrees m,n have at most mn common zeros,
where \mathbb{R} can in fact be replaced by any field, and
with the inequality being an equality in the algebraic closure, if one counts points at infinity and multiplicities of intersection. This can be proven either with the resultant, or by some argument of dimensions.

Lines in curves, planes in surfaces

For instance it implies that if a polynomial of degree m vanishes on m+1 distinct points of the line ax+by+c=0, then ax+by+c is a factor of this polynomial. However it’s not relevant at all in this case to invoke Bezout’s theorem, because the statement is quite immediate in the case where the line has equation x=0 – expand according to the powers of X, so latex P=\sum_{k=0}^{m}c_k(Y)X^k and notice that this line is in fact fairly general (by change of variables).
What about a polynomial in three variables now, so an algebraic surface of the space instead of an algebraic curve in the plane? An algebraic surface can very well include a full line and still be irreducible, as show ruled surfaces like the hyperbolic paraboloid z=x^2-y^2. Lines are not the zero set of a truly irreducible polynomial anymore; indeed a line is defined by two equations of degree 1 f=0 and g=0. But this system of equations is in \mathbb{R} synonymous with f^2+g^2=0, which involves a polynomial of degree 2 which is irreducible over \mathbb{R} but not over \mathbb{C}. But an algebraic surface cannot contain a plane, without the degree 1 equation polynomial defining the plane to factor out in it.

Intersection points, intersection lines
In dimension n, a reasonable generalization of Bezout’s theorem must involve enough hypersurfaces (of codimension 1) in order for the intersection to be expected to be points, so of dimension 0, hence you need n hypersurfaces. If their degrees are d_1,\cdots,d_n, it is reasonable to guess d_1\cdots d_n intersection points. But the example of the hyperbolic paraboloid (degree 2) with a line included in it, defined by two equations of degree 1, yields infinitely many points of intersections, instead of the expected 2… Although all the equations are coprime. But the correct statement is that if the number of intersections is finite, it’s at most the product of the degrees.

However, a variant of Bezout’s theorem involving two surfaces in $\mathbb{R}^3$ is still possible. Surely, you won’t bound the number of points of intersection, but you can bound the number of lines of intersection: the intersection of two coprime surfaces of degree n and m contains at most nm points. A proof of this can be seen on Larry Guth’s lecture notes on polynomial method lecture 13.

Monge-Cayley-Salmon: flecnodal points, ruled surfaces
Salmon’s theorem, hinted at by Monge earlier and also stated by Cayley, says that if an algebraic surface S=\{(x,y,z)\mid P(x,y,z)=0\} is such that at every point x there exists a line l_x\ni x tangent up to order 3 to S (flecnodal point), then for all points x, the line l_x is in fact fully included and so the surface is ruled. As a line tangent up to order 3, on a surface of degree 2 has to be included, we suppose that p is squarefree and of degree at most 3. Then flecnodal points of p are determined by a polynomial Fl(p) of degree 11d-24. Together with Bezout’s theorem, it yields the following dichotomy for a surface of degree d :
1) Either ruled, so one line in S through each point
2) Or only O(d^2) on S.

This is one of the basic ingredients for Guth-Katz solution of the Erdös distinct distance. problem.