It is well-known that a real number is a rational if, and only, if its decimal expansion is periodic (see Wikipedia). This result is a finiteness result, as it says the decimal expansion is, in a way, not really infinite, as it consists in the repeating infinitely a finite block.

A similar finiteness result exists for Laurent series \alpha=\sum_{i=1}^{\infty}\alpha_i T^{-i} \in K((1/T)), which are the analogues of reals modulo 1. Namely, associate to \alpha the infinite matrix (\alpha_{(i+j)-1})_{i,j\in\mathbb{N}^2}. Then Hankel’s theorem states that \alpha is rational if and only if the matrix is of finite rank.

We show here a quantitative version of this statement. Let

\displaystyle{M_{\alpha,n}=\begin{pmatrix} \alpha_{1} & \cdots & \alpha_{n} & \\ \alpha_{2} & \cdots & \alpha_{n+1} & \\ \vdots & \vdots & \vdots & \\ \alpha_{n} & \cdots & \alpha_{2n-1} & \end{pmatrix}.}

The rank of this matrix is related to Diophantine properties of \alpha.
To see this, we introduce for \beta=\sum_{i=-\infty}^{m}\beta_it^i\in\mathbb{F}_p((\frac{1}{t})) the fractional part \{\beta\}= \sum_{i=-\infty}^{-1}\beta_it^i and the norm \lVert{\beta}\rVert=\lvert\{\beta\}\rvert=q^k, where k\leq -1 is the largest i\leq -1 such that \beta_i\neq 0. Then we have the following result.

If the rank of M_{\alpha,n} is at most r, then there exist a decomposition r=i+j and a monic polynomial P of degree i such that \lVert P\alpha\rVert<q^{-2n+j}. The converse holds too.

The classical theory of Hankel matrices (see Chapter X, Paragraph 10 of Gantmacher’s book) tells us that if the rank of the matrix is r, then there exist integers i and h such that i+h=r and the i first rows are linearly independent, each of the following n-r rows is a linear combination of them,
and the minor formed of the first i and last h rows and columns is nonzero. Let L_1,\ldots,L_n be the rows of M_{\alpha,n}. Write L_m=(L_m',a_m) where a_m is the last coefficient of L_m and L'_m the row of the first n-1 coefficients. By the above, we have a relation
L_{i+1}=\sum_{m=1}^i c_mL_m. In fact, we shall show that for any j=0,\ldots,n-r-1, we have

L_{i+1+j}=\sum_{m=1}^i c_m L_{m+j}.

This equation holds for j=0. So we argue by induction and assume it holds for any j'\leq j for some j<n-r-1 and prove it for j+1. To start with, the displayed equation implies that
\displaystyle{L'_{i+1+j+1}=\sum_{m=1}^i c_mL'_{m+j+1}.}
Applying the induction hypothesis iteratively, we find coefficients c_m^{(k)} for m=1,\ldots,i and k\leq j+1 such that

\displaystyle{L_{i+1+k}=\sum_{m=1}^i c_m^{(k)}L_{m}}

\displaystyle{ L'_{i+1+j+1}=\sum_{m=1}^i c_m^{(j+1)}L'_{m}.}
These coefficients satisfy the initial condition c_m^{(0)}=c_m and the recurrence relations c_m^{(k+1)}=c_{m-1}^{(k)}+c_ic_m^{(k)} for m>1 and c_1^{(k)}=c_h^kc_1.

On the other hand, we know that there exist coefficients d_1,\ldots,d_i
such that
\displaystyle{L_{i+1+j+1}=\sum_{m=1}^i d_mL_{m}.}
Comparing the last equation to the one involving L', and using the linear independence of the first i lines, we find that d_m=c_m^{(j+1)}.

Thus a_{i+1+j+1}=\sum_{m=1}^i c_m^{(j+1)}a_m. But \sum_{m=1}^i c_ma_{m+j+1}=\sum_{m=1}^i c_m^{(j+1)}a_m by definition of the coefficients c_m^{(k)}. We infer that a_{i+1+j+1}=\sum_{m=1}^i c_ma_{m+j+1},
which conclude the inductive argument.

To see the connexion with Diophantine properties of \alpha, notice that L_i is the row of the first n coefficients of \{t^{i-1}\alpha\}. Thus the validity of  the identity for L' for all j=0,\ldots,n-r-1 implies that \{t^i\alpha\}=\{\sum_{m=1}^ic_mt^{m-1}\alpha\}+\beta
where \beta\in\mathbb{T} satisfies \lvert\beta\rvert<q^{-2n+h}.
That is, using the polynomial P=t^i+\sum_{m=1}^ic_mt^{m-1}, we find that \lVert P\alpha\rVert<q^{-2n+h} which concludes the proof of the direct statement. The converse is straightforward.


Let V=\mathbb{F}_p^n be a vector space over a finite prime field. One knows then which subsets A\subset V satisfy A+A=A, the sets that are stable under addition: these are the kernels of linear maps, that is, each such set A is the zero set of a bunch of linear forms.

Now we consider a two-coordinates analogous problem, that is, we consider a set A\subset V\times V and define the sets

\displaystyle{P\stackrel{V}{+}P= \{(x,y_1+ y_2)\mid (x,y_1),(x,y_2)\in P\}}

and P\stackrel{H}{+}P, V and H meaning vertical and horizontal. What are the sets that satisfy both P\stackrel{V}{+}P=P and P\stackrel{H}{+}P=P? Call such a set a transverse set. Natural exemples are cartesian products of vector spaces (which we shall call rectangles) as well as zero sets of bilinear forms. More generally, a set of the form \displaystyle{\{(x,y)\in W_1\times W_2\mid Q_1(x,y)=\cdots=Q_r(x,y)=0\}} for some subspaces W_2,W_2\leq V and some bilinear forms Q_i is horizontally and vertically closed. Call such a set a bilinear set, and call its codimensions the codimensions of the subspaces and the number of bilinear forms required (the dimension of the space of bilinear forms vanishing on it). What is the relationship between transverse sets and bilinear sets? Are all transverse sets essentially bilinear sets?

For a transverse set P\subset V\times V and x\in V, let P_x be the fiber above x. Thus P=\bigcup_x \{x\}\times P_x and P_x is a vector space. Actually P_x depends only the projective class [x]\in P(V). Moreover, the stability under horizontal operations is equivalent to the property that if [z] is on the projective line spanned by x and y, we have P_z\supset P_x\cap P_y.

The codimension 1 case

Suppose P_0=V and each subspace P_x has codimension at most 1. Write P_x=\xi(x)^\perp with \xi(x)\in P(V) some vector uniquely defined up to homothety. It is easy to see that the set of x such that P_x=V is a vector subspace which we call W. Furthermore, if x-y\in W, we have \xi_x=\xi_y, so that \xi descends to an injective map on V/W. Thus it is enough to study the case where P_x has codimension exactly one unless x=0. In other words, we have a map \xi : P(V)\rightarrow P(V) satisfying the property that if z is on the projective line spanned by x and y, then \xi_{z} lies on the projective line spanned by \xi_x and \xi_y. Thus \xi maps projective lines into lines. Actually it can be quickly checked that such a map is either bijective or constant on each single projective line.

If we suppose it is injective, too, then by the fundamental theorem of projective geometry, we conclude that it is a projective map. Thus it is induced by a linear map \xi : V\rightarrow V. Hence writing Q(x,y)=\xi(x)\cdot y, we have shown that P is the zero set of the bilinear form Q.

In general \xi is far from being injective. It can for instance be constant on P(V) (equal to H^\perp for some hyperplane H). In this case P=\{0\}\times V\cup V\times H. This is neither a rectangle, nor the zero form of a quadratic form, because if it was, \xi could be taken linear, and because \xi_x=0\leftrightarrow x=0, it should be injective. In fact, if it is the intersection of zero sets of a family of bilinear forms, there needs to be n bilinear forms in this family, such as the forms x_iy_n=0 if (wlog) H is the hyperplane y_n=0. However P obviously contains a very large rectangle.

These are actually the only cases: such a map \xi is either injective or constant. Indeed, if P(V) is just a line, it’s obvious, so suppose \dim V\geq 3. Suppose that \xi maps projective lines into lines (in the sense above) but is neither injective nor constant. That is, there exist two distinct points x,y such that \xi_x=\xi_y, and a third point z satisfying \xi_z\neq \xi_x. This implies that x,y,z span a projective plane. Take a point w\neq y,z on the line spanned by y,z. Because \xi is a bijection on both lines (yz),(xz), you can find w'\neq x,z on (xz) such that \xi_w=\xi_{w'}\neq \xi_x. Now consider the intersection u=(ww')\cap (xy). Then  you have \xi_u=\xi_x=\xi_y\neq \xi_w, so that on the line (ww'), the map \xi is neither constant nor injective, which is absurd.

Arbitrary codimension

A more general potential counterexample, communicated to me by Ben Green, is a set of the form P=\bigcup_{i=0}^r V_i\times W_i where V_i, W_i are sequences of increasing (resp. decreasing) subspaces.  Thus for x\in V, we have P_x=W_{i(x)} where i(x)=\min\{i\mid x\in V_i\}. Again you can make it a bilinear set, but you need atrociously many bilinear forms.

However it contains a large rectangle. In fact, \dim V_i\leq n-(r-i) while \dim W_i=n-i so that the density of P is at most (r+1)p^{-r}\ll p^{-r/2}. Thus r\ll \log\delta^{-1}. Finally by the pigeonhole principle, one of the cartesian products V_i\times W_i have density at least \delta/\log\delta^{-1}.

In general, we have a vector space P_x=A_x^\perp. Suppose for instance each A_x has codimension 2. If we manage to find maps x\mapsto \xi_1(x),\xi_2(x) such that \xi_i(x+y)\in\text{span} (\xi_i(x),\xi_i(y)) for each i,x,y, we are in good shape. Though it is not clear how to show that such a consistent choice of bases can be made…

The conjecture

If a A\subset V\times V has cardinality at least \delta \lvert V\rvert ^2, it contains a bilinear set of (linear and bilinear) codimensions c(\delta)=O(\log^{O(1)}\delta^{-1}).


Given a compact abelian group G, endowed with a Haar probability measure \mu, let \mathcal{F}_a be the set of (Borel) measurable functions G\rightarrow [0,1] satisfying \int_G fd\mu \geq \alpha. The average of f is called its density.

Let \Psi be system of linear forms, i.e \Psi=(\psi_1,\ldots,\psi_t): G^d\rightarrow G^t and


where c_{i,j}\in\mathbb{Z} for any i\in[t] and j\in[d]. For a measurable function f : G\rightarrow [0,1], one may define the \Psi-count of f as

\displaystyle{T_\Psi(f)=\int_{G^d}\prod_i f(\psi_i(x))d\mu(x).}

Here by a slight abuse of notation, the measure \mu on G^d denotes in fact the product measure \mu^{\otimes d} of the Haar measure on G.



When a system \Psi is fixed, we may omit it from the notation m_\Psi. Given a sequence G_n of finite groups of cardinality tending to infinity, it is natural to consider the sequence m(\alpha,G_n). When the infimum is taken only over \{0,1\}-valued measurable functions of density at least \alpha, this is the minimum number of APs a set A\subset G_n of density at least \alpha can have. But these infimum are actually the same (basically by convexity).

The case G_n=\mathbb{Z}/n\mathbb{Z} was introduced by Croot. He proved that the minimal density m(\alpha,\mathbb{Z}_N) of three-term arithmetic progressions converged as N\rightarrow\infty through the primes. Sisask, in his PhD thesis, was able to express this limit as an integral over the corresponding limit object, namely the circle \mathbb{T}=\mathbb{R}/\mathbb{Z}, thus

\displaystyle{\lim_{\substack{N\rightarrow +\infty\\ N\text{ prime}}} m(\alpha,\mathbb{Z}_N)=m(\alpha,\mathbb{T}).}

As customary, the corresponding problem in finite field models is cleaner. Thus we set G_n=\mathbb{F}_p^n. Whereas Croot noticed that the sequence m(\alpha,\mathbb{Z}_N) was in some sense approximately decreasing, it is easy to see that for any system \Psi, the sequence m_\Psi(\alpha,G_n) is genuinely non-increasing. This is because \mathbb{F}_p^n embeds naturally in \mathbb{F}_p^{n+1}; thus if f is a map from \mathbb{F}_p^n to [0,1], it can be extended into \tilde{f} : \mathbb{F}_p^{n+1}\rightarrow [0,1] by putting \tilde{f}(x_1,\ldots,x_{n+1})=f(x_1,\ldots,x_n).

Now \mathbb{E} \tilde{f}=\mathbb{E}f, and T_\Psi(f)=T_\Psi(f). This proves that m_\Psi(\alpha,G_n)\geq m_\Psi(\alpha,G_{n+1}), which implies that this sequence has a limit.

Let \Psi be now fixed. We want to find an appealing expression for the limit of the sequence m(\alpha,G_n). It is thus tempting to look for a natural topological compact group that would play the role of limit object of the sequence G_n, just as \mathbb{T} played the role of limit object for the sequence \mathbb{Z}/N\mathbb{Z}.

The analogy \mathbb{Z}\leftrightarrow \mathbb{F}_p[t]=\cup \mathbb{F}_p^n=\mathbb{F}_p^{\infty} leads one to introducing G_\mathbb{N}=\mathbb{F}_p^\mathbb{N} and G_\infty=\mathbb{F}_p^{\infty}. The group G_\mathbb{N} equipped with the product topology is a compact group by Tychonoff’s theorem, and G_\infty equipped with the discrete topology is locally compact; it is the Pontryagin dual of G_\mathbb{N}. It is now reasonable to ask whether

\displaystyle{\lim m(\alpha,G_n)=m(\alpha, G_\mathbb{N}).}

One easily notices that m(\alpha,G_n)\geq m(\alpha, G_\mathbb{N}) for any n\in\mathbb{N} by extending, as above, a function f : \mathbb{F}_p^n \rightarrow [0,1] into a function \tilde{f} on \mathbb{F}_p^\mathbb{N} with the same density and the same \Psi-count. Indeed, one can define


and then \tilde{f} is a measurable function which has same density and same \Psi-count.

We prove the other inequality. For this we prove two intermediate propositions. We write from henceforth G=G_\mathbb{N}.

Proposition 1

If f_n is a sequence of [0,1]-valued functions on G that converges in L^2(G) to a [0,1]-valued function f, then T(f_n)\rightarrow T(f).


Let F_n(x)=\prod_{i\in[t]} f_n(\psi_i(x))-\prod_{i\in[t]} f(\psi_i(x)). We show that F_n converges in L^1(G^d) to 0, which will conclude. In fact we prove a stronger convergence, a convergence in L^2. We show that \lVert F_n-F\rVert_{2}\leq t \lVert f_n-f\rVert_2. For that we simply remark that


Now, any non-trivial linear map \theta : G^d\rightarrow G, i.e. any map of the form

\displaystyle{\theta(x_1,\ldots,x_d)=\sum_{i=1}^d c_ix_i.}

with (c_1,\ldots,c_d)\in \mathbb{F}_p^d\setminus\{0\}, preserves the Haar measure, which implies that

\displaystyle{\lVert f_n\circ\psi_j-f\circ\psi_j\rVert_{L^2(G^d)}=\lVert f_n-f\rVert_{L^2(G)}.}

Using this, the triangle equality and the fact that the functions are 1-bounded, we conclude.

Proposition 2

Let \mathcal{F}_{a,n} be the set of functions \tilde{f} for f : G_n\rightarrow [0,1] of density at least \alpha. Then \mathcal{F}_{a,n} is dense in (\mathcal{F}_a, \lVert \cdot\rVert_2).


Let f\in\mathcal{F}_aWe construct a sequence of functions f_n\in\mathcal{F}_{a,n} that converge to f. We use duality and Fourier transform. We have \hat{G_n}=G_n and \hat{G_\mathbb{N}}=G_\infty=\cup G_n.

For \chi\in G_\infty, write


Then write


This defines a function f_n : G_n \rightarrow \mathbb{C}, which obviously satisfies


The same holds for its extension \tilde{f_n}Notice that \tilde{f_n} is also the conditional expectation with respect to the \sigma-algebra \mathcal{P}(G_n)More precisely,

\displaystyle{f_n(x_1,\ldots,x_n)=\mathbb{E}_{y\in G}(f(y)\mid (y_1,\ldots,y_n)=(x_1,\ldots,x_n))}

as both equal probacondi.

This implies in particular that f_n, and thus \tilde{f_n}has its values in [0,1].

To check that \mathbb{E} f_n \geq \alpha, we use the conditional expectation interpretation

\displaystyle{\mathbb{E}[f_n]=\mathbb{E}[\mathbb{E}[f\mid G_n]]=\mathbb{E}[f]\geq \alpha.}

Now we check the convergence. It follows from basic harmonic analysis: f being in L^2(G), its Fourier series f_n converges to f in L^2(G). This concludes the proof of Proposition 2.

Thus even for 4-APs, this shows that


exists and equals

\displaystyle{\inf_{f:\mathbb{F}_p^\mathbb{N}\rightarrow[0,1],\int f\geq \alpha}\int f(x)f(x+y)f(x+2y)f(x+3y)dxdy.}

This is in contrast to a paper of Candela-Szegedy, for the same pattern \Psi (4-APs) but with G_n=\mathbb{Z}/n\mathbb{Z} where they obtain as a limit a much more complicated infimum, an infimum over functions f on \mathbb{T}^2 of the \Psi'-count, where \Psi'\neq\Psi is another pattern. The part that holds in the setting of \mathbb{F}_p^n and not in the setting of \mathbb{Z}_p is Proposition 2. The truncated Fourier series of f : \mathbb{T}\rightarrow[0,1], which is the natural L^2 approximation, does not give a function on cyclic groups \mathbb{Z}_p with the same density and AP-count as f.

We remark that the paper of Hatami-Hatami-Hirst implies that the sequence of functions f_n admits a limit object g which is defined on a finite cartesian power of G_\mathbb{N}.

In fact, if \Psi is the k-APs, pattern of complexity k-2, and p\geq k, the limit object g is defined on G_\mathbb{N}^{k-2}. In particular, if k=3, this means that the infimum above is attained, i.e.

\lim_{n\rightarrow \infty}m(\alpha, G_n)=T(g)

for some g : \mathbb{F}_p^\mathbb{N}\rightarrow [0,1] measurable. When gnzn, it is still not known whether such a result (i.e. the existence of a limit object g : \mathbb{T}\rightarrow [0,1]) holds. However, Szegedy proves the existence of a limit object g : X\rightarrow [0,1] on a non-explicit group X.


A recent breakthrough due to Croot, Lev and Pach, recycled by Ellenberg-Gijswijt and streamlined by Tao on this blog post, revolutionised the landscape of Roth’s type problems in vector spaces over finite fields. Indeed, Ellenberg-Gijswijt obtained that a set A\subset \mathbb{F}_3^n free of triple satisfying x_1+x_2+x_3=0 can have at most 2.756^n elements out of the 3^n out there. Here we show the same method can deal with sets free of solutions to various other equations in any \mathbb{F}_p^n (given that if q=p^k, we have \mathbb{F}_q^n\sim \mathbb{F}_p^{kn} as a vector space, there’s no point in considering general \mathbb{F}_q^n) and we make explicit the dependence on p of the resulting bound. We refer in the sequel mostly to Tao’s symmetric proof instead of the original articles. One great merit of this exposition is to reveal, thanks to the extension of the notion of rank from matrices (seen as functions of two variables) to functions of more variables, how the scope of the method can be extended.

Other linear equations

In fact the bound found for the sets without 3-term arithmetic progressions is also valid
for a set A\subset \mathbb{F}_p^n having no non-trivial solutions to
where \phi_1,\phi_2,\phi_3 are three linear automorphisms of \mathbb{F}_p^n summing to 0. Indeed,
we can then still write
\displaystyle{\delta_{0^n}(\phi_1(x_1)+\phi_2(x_2)+\phi_3(x_3))=\sum_{a\in A}\delta_a(x_1)\delta_a(x_2)\delta_a(x_3).}
Even more generally, for any integer k\geq 3, if \phi_1,\ldots,\phi_k are k automorphisms of \mathbb{F}_p^n which sum to 0, and if A\subset \mathbb{F}_p^n does not have any non-trivial solution to the equation \sum_{i\in[k]}\phi_i(x_i)=0, we can still write

\displaystyle{ \delta_{0^n}(\sum_{i\in[k]}\phi_i(x_i))=\sum_{a\in A}\prod_{i\in[k]}\delta_a(x_i). }

The right-hand side is still a function on A^k of rank \lvert A\rvert. Meanwhile, the left-hand side may still be expanded using the formula \delta_{0}(x)=1-x^{p-1} valid for all x\in\mathbb{F}_p, which implies that the LHS equals

\displaystyle{ \prod_{j\in[n]}(1-(\sum_{i\in[k]}\phi_{i}^j(x_i))^{p-1}) }

where \phi_i^j means the j-th coordinate of \phi_i; it is
thus a linear polynomial. This implies that the whole expression is a polynomial of degree (p-1)n. The treatment shown by Tao on his blog to bound the rank of this expression can thus be applied here too.

Roth’s theorem is simply the case where k=3 and \phi_1=\phi_2=\phi_3=\text{Id} for p=3, or more generally \phi_1=\phi_2=\text{Id} and \phi_3=-2\text{Id}.

The more variables, the smaller the bound becomes. For instance, a set A\subset \mathbb{F}_5^n without any solution to the equation x+3y=2z+2w, in particular without any 3-APs, has size at most 3.83^n, less than the bound for sets without 3-APS which is 4.46^n.

Even non-linear equations

Upon scrutinizing the polynomial method at work in Ellenberg-Gijswijt, one can be struck by the fact that the translation-invariance of the equation x+y+z=0 is not even used. In Meshulam’s theorem, as in all form of density increment strategy, we crucially use the fact that if (x,y,z) is a solution and if c\in\mathbb{F}_p^n is any constant, then (x+c,y+c,z+c) is a solution too. What is used in the polynomial strategy is that any constant tuple (c,\ldots,c) is a solution, which is vastly weaker (though it is synonymous for linear patterns). Thus if \phi_1,\phi_2,\ldots,\phi_k are any bounded-degree polynomial maps \mathbb{F}_p^n \longrightarrow\mathbb{F}_p^m (i.e. maps whose coordinates are polynomials of degree at most d) summing to 0, then the method should, in principle, spit out a bound for the maximal size of a set A\subset \mathbb{F}_p^n free of solutions to \sum\phi_i(x_i)=0, depending on the degree. The only thing that changes in Tao’s write-up is that the polynomial


 whose rank we try to bound is now of degree up to (p-1)md. Thus the parameter of relevance and which determines whether the method will succeed or not is m/n \cdot d/k. It has to be significantly less than 1/2. Otherwise, the method requiring a good bound (exponential if possible) on

\mathbb{P}(X_1+\cdots+X_n\leq (p-1)m d/k)

where X_1,\ldots, X_n are independent, identically distributed variable on \{0,\ldots,p-1\}, we are hopeless.

Thus, considering m=2n,d=2, we need k=9 variables.

Equations in a polynomial ring

As advocated by Tom Bloom (for instance in this paper), it might be of interest to ask for analogues of theorems in polynomial ring. However, a relation of the form f_1+f_2=2f_3 among three polynomials over \mathbb{F}_q of degree less than n is simply a relation among three vectors, the vectors of their coefficients. But we can form much more general equations, such as

\displaystyle{\sum_{i=1}^k\alpha_i f_i=0}

for some \alpha_1,\ldots,\alpha_k \in \mathbb{F}_p[t]. We require again \sum_i\alpha_i=0. Now the map f\mapsto \alpha f on the polynomials of degree less than n translates, in term of the coefficients of a polynomial, into a linear map \phi from \mathbb{F}_p^n to \mathbb{F}_p^{n+d} where d= \text{deg }\alpha.
Let d be the maximum of the degrees of the \alpha_i and let us introduce the maps \phi_i :\mathbb{F}_p^n\mapsto \mathbb{F}_p^{n+d} corresponding to asking for the size of A\subset \mathbb{F}_p[t] without any solution to the equation above amounts to asking for the size of a set A\subset \mathbb{F}_p^n without any solution to the equation \sum_i\phi_i(x_i)=0, a situation we have already analysed.
Even better, translating the map f\mapsto f^2 into a a polynomial map \mathbb{F}_p^n\mapsto \mathbb{F}_p^{2n-1}, namely
we’re back to the previous paragraph. Here m=2n-1,d=2, so that to ensure md/nk<1/2, we need k\geq 9. We then get a bound \lvert A\rvert \ll (0.99 p)^n. This number of variables is terrible. In fact, in the integers, Browning and Prendiville showed that 5 variables suffice for such an equation, that is, they got a bound for the size of a set A\subset [N] without solution to the equation \sum_{i=1}^4 x_i^2=4x_5^2. It is not satisfactory to end up needing more variables in the function field setting than in the integer case. The polynomial method does not seem good at tackling polynomial equations.

Dependence on p

Let’s return to the case of Roth’s theorem.
Following Tao, one gets a bound of the form \lvert A \rvert\ll n^{p-1}\exp(nc_p)
where c_p is the maximum of

\displaystyle{ h(\alpha_0,\ldots,\alpha_{p-1})=-\sum_{k=0}^{p-1}\alpha_k\log\alpha_k}

under the constraints

\displaystyle{ \alpha_i\geq 0, \sum \alpha_i=1,\sum i\alpha_i\leq (p-1)/3}

It could be interesting to know how c_p evolves with p.
Using the help of a computer to compute c_p for p upto 10^8, I got the feeling that \exp(c_p)/p\approx 0.84+0.30/p. This would mean that a subset A\subset \mathbb{F}_p^n free of 3-term arithmetic progressions necessarily satisfies \lvert A \rvert\ll (0.85p)^n.

We now proceed to show that the empirically conjectured behaviour of c_p is indeed the true one.

The constrained extrema theorem implies that the maximum of h is attained on a point where

\displaystyle{\overrightarrow{\mathrm{grad}} h=-(\alpha_0\log\alpha_0+1,\ldots,\alpha_{p-1}\log\alpha_{p-1}+1)}

belongs to \text{Vect}((1,\ldots,1),(0,1,\ldots,p-1)). This condition amounts to the existence of a,b\in\mathbb{R} so that \alpha_i\log\alpha_i=\exp(1-a-ib) for all indices i. So we have only two degrees of freedom for the tuple (\alpha_0,\ldots,\alpha_{p-1}. And because we have two equations

\displaystyle{ \sum \alpha_i=1,\sum i\alpha_i= (p-1)/3}

there should be a unique such tuple.
Putting q=\exp(-b), we transform the first constraint in

\displaystyle{ \exp(1-a)\sum_{i=0}^{p-1} q^i=1 }

thus the second one amounts to

\displaystyle{ \sum_{i=0}^{p-1}iq^i=\frac{p-1}{3}\sum q^i. }

Here the LHS equals

\displaystyle{ \frac{((p-1)q-p)q^p+q}{(q-1)^2}. }

Thus the equation to solve is

\displaystyle{ ((p-1)q-p)q^p+q=\frac{(p-1)(q-1)}{3}(q^p-1), }

in other words

\displaystyle{ 2(p-1)q^{p+1}-(2p+1)q^p+3q+(p-1)q=(p-1). }

It clearly imposes that as p\rightarrow\infty, we must have q\rightarrow 1, for if it remains bounded away from 1, the left-hand side is asymptotic to (p-1)q, which is not asymptotic to p-1, the right-hand side.

We now rearrange the equation as

\displaystyle{ p(q+2q^{p+1}-2q^p)-2q^{p+1}-q^p+2q=p-1 }

that is

\displaystyle{ p(q-1)(1+2q^p)=2q^{p+1}+q^p-2q-1=(q^p-1)(2q+1). }

Let us introduce \epsilon=1-q\rightarrow 0 and determine the behaviour of p\epsilon, or equivalently of q^p. One can prove by contradiction that p\epsilon cannot tend to 0. Indeed, supposing it did, one could write

\displaystyle{ q^p=\exp(p\log(1-\epsilon))=\exp(-p\epsilon)\exp(O(p\epsilon^2)) =1-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2. }


\displaystyle{ -p\epsilon(3-2p\epsilon+(p\epsilon)^2+o(p\epsilon)^2)=(-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2)(3-2\epsilon) }

and comparing the term in (p\epsilon)^2 we see an absurdity.
It can’t either tend to \infty. If it has a limit point \alpha, it has to be a solution to

\displaystyle{ -\alpha(1+2\exp(-\alpha))=3(\exp(-\alpha)-1). }


The computer declares that the solution to this equations equals 2.149125799907062 (which is very close, fortunately enough, to values observed for p(1-q) for large p). Thus q^p tends to \exp(-\alpha)\approx 0.11.

Injecting the expression for the \alpha_i in terms of a,b,  the optimum is

\sum_k(a+kb-1)\exp(1-a-kb) = \exp(1-a)\left((a-1)\sum q^k+b\sum kq^k\right)

which we can equate, remembering earlier formulae, to
So let us derive an asymptotic expression for a and b. We quickly obtain a=\log p+\log\frac{1-\exp(-\alpha)}{\alpha}+1+o(1), b=\alpha/p+o(1/p).
So the maximum we are after equals

\displaystyle{c_p= \log p+\log\frac{1-\exp(-\alpha)}{\alpha}+\alpha/3+o(1) }

This implies that

\displaystyle{\frac{\exp(c_p)}{p}= \frac{1-\exp(-\alpha)}{\alpha}\exp(\alpha/3). }

Injecting the value we found for \alpha, the computer spits out 0.84, as guessed.


Many properties of the set of the primes are inherited by any dense subset thereof, such as the existence of solutions to various linear equations. But there’s one property, recently proved, namely the existence of infinitely many gaps p_{n+1}-p_n\leq C bounded by a constant, which is not inherited by dense subsets of the primes. For instance, it is possible to remove basically one half of the primes and be left with a subset of primes \{p'_n\mid n\in\mathbb{N}\} such that p'_{n+1}-p'_n\ll \log p_n.

So instead one could ask if when coloring the primes in finitely many colours, one is guaranteed to find a colour class containing infinitely many pairs of primes whose difference is bounded by a common constant. This does not follow immediately from pigeonhole principle, which only says that one colour class contains infinitely many primes.

However, this follows easily if we recall that the work of Maynard does not only give infinitely many bounded gaps, that is \liminf p_{n+1}-p_n<+\infty, but also for any m that \liminf p_{n+m}-p_n < +\infty. Let H(m) be this liminf.

In other words there are infinitely many (disjoint) intervals of size I_n=[x_n,x_n+H(m)] which contains at least m+1 primes. So if you colour the primes in m colours, and you consider these intervals, you find that each one must contains at least two primes of the same colour. By pigeonhole principle again, there must be a colour for which this occurs infinitely often. Whence the existence in this colour class of infinitely gaps of size at most H(m).

The squarefree numbers form a set which is well-known to have density 6/\pi^2 in the integers, in other words their characteristic function \mu^2 satisfy \sum_{n\leq x}\mu^2(n)=6/\pi^2x+o(x). In this set all your dream theorems about the count of linear patterns are true and easy.

Hardy-Littlewood squarefree tuples

The Hardy-Littlewood conjecture consists, in its strongest form, in an asymptotic, for every \mathcal{H}=\{h_1,\ldots,h_k\}, of the form

\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all prime}\}\rvert=(c(\mathcal{H})+o(1))\frac{x}{\log^kx}}

where c(\mathcal{H}) is a constant possibly equal to 0. It is quite far from being proven, although interesting steps towards it have been made in the last ten years, by Goldston-Pintz-Yildirim and Maynard.

In the squarefree numbers, the analogous problem is rather easy and was settled by Mirsky in 1947. He proved that

\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all squarefree}\}\rvert=c(\mathcal{H})x+O(x^{2/3+\epsilon})}



and \nu_{\mathcal{H}}(p^2) is the cardinality of the size of the projection of the set \mathcal{H} in \mathbb{Z}/p^2\mathbb{Z}, that is the number of forbidden classes modulo p^2 for n.

I show a quick way of deriving it with a much worse error term O(x/\log x). Fix w, a large number; ultimately, w=C\log x should be ok. Fix W=\prod_{p\leq w}p^2. Then the number X_w of n\leq x such that none of n+h_1,\ldots,n+h_k have a square factor p^2 with p\leq w is easy to compute: in [W] or any interval of length W there are by chinese remainder theorem this many of them
\displaystyle{ \prod_{p\leq w}(p^2-\nu_{\mathcal{H}}). }
Thus in [x], there are this many of them

\displaystyle{x/W^2\prod_{p\leq w}(p^2-\nu_{\mathcal{H}})+O(W^2)}.

There are also people who are not squarefree, but escape our sieving by primes less than w. These are the guys such that at least one of the n+h_i have a divisor p^2 with p>w prime. Now there are are at most x/p^2 multiples of p^2 in [x]. So at most
\displaystyle{ \sum_{w<p\leq \sqrt{x+\max h_i}}k(x+\max h_i)p^{-2}=o(x/w) }
should be removed. Let’s try to balance the error terms O(x/w) and O(W^2)=\exp(2w(1+o(1)); take w=\frac{1}{2}(\log x-\log\log x).
This way both seem to be O(x/\log x).

Green-Tao type theorem

We can also easily get asymptotics of the form

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)(\prod_p\beta_p+o(1))}

where K\subset [0,N]^d is a convex body and \Psi a system of affine-linear forms of which no two are affinely related. Moreover \beta_p is again the proportion of non-forbidden vectors a=(a_1,\ldots,a_d) of residues modulo p^2. One way to do it would be to simply use the Hardy-Littlewood type asymptotics proved above coordinate by coordinate. We can also prove it quite easily, by first observing that the set of n where at least one of the \psi_i vanishes has size O(N^{d-1}) because the 0-set of an affine-linear form is an affine hyperplane. Then as \psi_i(n)=O(N), the divisors a_i will have to satisfy a_i\ll \sqrt{N}. So now we restrict the set of n to the ones where the forms don’t vanish. Then we partition the sum into one where a_1 is restricted to be at most N^{\delta} (we fix some small \delta >0), and the remaining one. Using the well-known fact that \mu^2(n)=\sum_{d^2\mid n}\mu(d), we remark that the remaining one equals

\displaystyle{  \sum_{N^{\delta}\leq a_1\ll \sqrt{N}}\mu(a_1)\sum_{n:a_1^2\mid \psi_1(n)}\mu^2(\psi_2(n))\cdots\mu^2(\psi_t(n))}

Now the number of n in the sum is O(N^d/a_1^2+N^{d-1}). Given that \sum_{N^{\delta}\leq a_1} a_1^{-2}\ll N^{-\delta}, we can bound this sum by O(N^{d-\delta}). We proceed similarly for i=2,\ldots,t, so that the sum to estimate is

\displaystyle{  \sum_{a_1,\ldots,a_t\leq x^\delta}\prod_i\mu(a_i)\lvert\{n\in\mathbb{Z}^d\cap K\mid \forall i \quad \psi_i(n)\equiv 0\mod a_i^2\}\rvert}

Quite gross volume packing arguments indicate that the number of integral points (lattice points) to estimate above is \text{Vol}(K)\alpha_{\Psi}(a_1^2,\ldots,a_t^2)+O(N^{d-1+2t\delta}), where

\displaystyle{ \alpha_{\Psi}(k_1,\ldots,k_t)=\mathbb{E}_{n\in [\text{lcm}(k_i)]^d}\prod_i1_{k_i\mid\psi_i(n)}}


is the density of zeroes modulo the a_i^2. Moreover we can extend the sum beyond N^\delta at the cost of a mere N^{O(\delta)}. Hence finally for any \epsilon >0 we can write that

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)\prod_p\beta_p+o_\epsilon(N^{d-1+\epsilon}).}

Indeed by multiplicativity it is easy to transform \sum_{a_1,\ldots,a_t}\alpha_{\Psi}(a_1^2,\ldots,a_t^2)\prod_i\mu(a_i) into a product over primes.





The quest for repeated, infinitely frequent patterns in the primes, is certainly a very old one, and is often even a quest for the asymptotic frequency of these patterns, which is much harder. For instance proving that there are infinitely many primes is easy, finding a satisfactory answer for the question « How many (roughly) until N large ? » much less so. Inbetween lies the obtention of upper and lower bounds of the right shape, due by Chebyshev.

Here pattern stands for images of a system of polynomial forms. Given \Psi=(\psi_1,\cdots,\psi_t)\in(\mathbb{Z}[X_1,\cdots,X_d])^t and a convex set K=K_N\subset\mathbb{R}^d of volume increasing to infinity, we are thus interested in evaluating

\displaystyle{\sum_{n\in K\cap\mathbb{Z}^d}\prod_{i=1}^t\Lambda(\psi_i(n))}

for N large, where \Lambda is the von Mangoldt function. The case where d=1 and \psi_i(n)=n+b_i is the original question of Hardy and Littlewood, who proposed a tantalizing asymptotic behaviour but is still completely out of reach (even the question whether there are infinitely many n such that n,n+2 are both primes is not settled). But the case where the system \Psi is affine-linear (thus the polynomials are all of degree 1) and no two forms are affinely dependent was solved by Green and Tao in the celebrated article Linear equations in primes.

Similar results for more general polynomial forms are rare. We have to mention the famous work of Friedlander and Iwaniec yielding an asymptotic for the number of primes of the form p=x^2+y^4, where it appears that

\displaystyle{\sum_{x\leq\sqrt{N},y\leq N^{1/4}}\Lambda(x^2+y^4)}\sim Cx^{3/4}

for some constant C>0.

I have uploaded yesterday an article on the ArXiv which provides asymptotics of the same shape as the ones in the Hardy-Littlewood for a few exceptional polynomial patterns.  Thus for instance, I can tell how many arithmetic progressions of three primes smaller than N exist whose common difference is a sum of two squares – well not quite, because I have to weigh these arithmetic progressions by the number of representations of the common difference. Now this weight, giving a positive density to the set of sums of two squares, which is sparse, of density N/\sqrt{\log N}, just as the von Mangoldt function is a weight (mostly) on primes giving them a density, cannot be easily eliminated afterwards, in contrast to the von Mangoldt function (one can write for n\leq N that \Lambda(n)\sim 1_\mathbb{P}(n)\log n\sim 1_\mathbb{P}(n)\log N).

More precisely, the result that naturally comes out concerning three term arithmetic progressions with common difference a sum of two squares is

\displaystyle{\sum_{1\leq a\leq a+2d\leq N}\Lambda(a)\Lambda(a+d)\Lambda(a+2d)R(d)=\pi N^2/4\prod_p\beta_p+o(N^2)}

where R(n)=\mid\{(x,y)\in\mathbb{Z}^2\mid n=x^2+y^2\}\mid is the representation function and \beta_p are some explicit constant which I don’t reproduce here. Moreover, we can generalise to other positive definite binary quadratic forms than this one, and there’s nothing special about length three: an asymptotic is available for any length. Here we notice that in some sense, the result is only seemingly polynomial, and truly rather linear: the polynomial nature of the pattern is enclosed in a linear input into the representation function of a quadratic form.

In fact, my article contains a more general result of which the one above is only a particular case. My work consisted in mingling the von Mangoldt function with the representation functions of quadratic forms, whose behaviour on linear systems have been already analysed respectively in by Green and Tao and Matthiesen. The idea is to consider sums of the form

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i=1}^tF_i(\psi_i(n))}

where F_i can be the von Mangoldt function or a representation function, and the $\psi_i$ are linear forms. The cohabitation of both types of functions went quite well. One delicate point was to eliminate biases modulo small primes of both types functions, an operation known as the W-trick. The difficulty is that while the value of the von Mangoldt function is more or less determined by the coprimality to small primes, it is not so for the representation function, which is also sensitive to the residue modulo large powers of small primes. Once this issue is adressed carefully, it is possible to majorize them by one and the same pseudorandom majorant, which opens the way to the application of the transference principle.

Similarly, the cohabitation between the von Mangoldt function and the divisor function is quite natural, yielding asymptotics for expressions such as \sum\Lambda(n)\Lambda(n+ab)\Lambda(n+2ab)=\sum\Lambda(n)\Lambda(n+d)\Lambda(n+2d)\tau(d). This is reminiscent of the Titchmarsh divisor problem, the evaluation of \sum_n\Lambda(n)\tau(n+a) or (almost equivalently) of \sum_p\tau(p+a), but the latter expression involves a linear system of infinite complexity, and is thus altogether out of reach of my method, just as the twin primes or the basic Hardy-Littlewood conjecture.

One may hope to extend the generalised Hardy-Littlewood conjecture stated (and proven in the finite complexity case in the paper linked) by Green and Tao to polynomial systems. For example given a polynomial \displaystyle{\phi \in \mathbb{Z}[x_1,\ldots,x_d]} and a bounded domain K\subset \mathbb{R}^d (probably a convex body would be more reasonable), one may be interested in an asymptotic for

\displaystyle{\lvert\{x=(x_1,\ldots,x_d)\in K\cap \mathbb{Z}^d\mid \phi(x)\text{ is prime}\}\rvert.}

We will rather try to get an asymptotic of the form

\displaystyle{\sum_{(x_1,\ldots,x_d)\in\mathbb{Z}^d\cap K}\Lambda(\phi(x_1,\ldots,x_d))=\beta_{\infty}\prod_p\beta_p+o(\beta_{\infty})}

where \beta_{\infty} is basically the number of points with positive integer coordinates in K, so hopefully \beta_{\infty}=\text{Vol}(K\cap \phi^{-1}(\mathbb{R}_+)), and the local factors take into account the obstructions or the absence of obstructions to primality modulo p. Recall that \Lambda classically denotes the von Mangoldt function. There are only very few non-linear polynomials for which an asymptotic for the number of prime values is available. The easiest one is obviously \phi(x,y)=x^2+y^2. Indeed, the primes which are sum of two squares are the primes congruent to 1 modulo 4 (and also the prime 2, but it’s a single prime, so we don’t have to care), and they are represented 8 times each as a value of \phi. So

\displaystyle{\sum_{x^2+y^2\leq N}\Lambda(x^2+y^2)\sim\sum_{n\leq N,n\equiv 1\text{ mod } 4}\Lambda(n)\sim 4N.}

Now let’s check what the conjecture would say. Here \beta_{\infty}=\text{Vol}(\{(x,y)\mid x^2+y^2\leq N\})\sim \pi N. What about the \beta_p ? They are supposed to be \beta_p=\frac{p}{p-1}\mathbb{P}(\phi(x,y)\neq 0\text{ mod } p). Now it easy to check that \mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{2p-1}{p^2} if p\equiv 1\text{ mod } 4, and \mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{1}{p^2} otherwise. So \beta_p=\frac{p-1}{p} in the former case, and \beta_p=\frac{p+1}{p} in the latter case. Thus, \prod_p\beta_p doesn’t converge absolutely, in contrast with the traditional Green-Tao situation, where \beta_p=1+O(1/p^2)… However, if we imagine that to each prime p\equiv 1\text{ mod } 4 corresponds a prime p'\equiv 3 \text{ mod } 4 with p\approx p', we could compute the product of the \beta_p by grouping the factors into pairs \beta_p\times\beta_{p'}\approx 1-1/p^2. A bit more precisely,

\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1+1/p)\sim C\sqrt{N}}


\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1-1/p)\sim C'/\sqrt{N}}

which implies that

\displaystyle{\prod_{p\leq N}\beta_p}\sim CC'

so this product is convergent, although not absolutely. I guess the constant C' is e^{-\gamma/2}, see Mertens theorem, and that C=e^{\gamma/2}\prod_{p\equiv 1\text{ mod } 4}(1-p^{-2}). I don’t really know how to compute this convergent product. We quickly notice that \beta_2=1. If the product could be equal to 4/\pi, which it can’t unfortunately, being smaller than 1, we would apparently get the correct result, but this remains a quite dodgy case, which should make one cautious about stating ambitious generalisations of the Hardy-Littlewood conjecture.

When one considers the sequence of values f(n) of some reasonable arithmetic function, like the divisor function \tau, or the function \omega(n) the number of prime factors of n, one may notice that excluding a few abnormal values, the sequence looks very much like a neat, recognizable sequence. Or at least, smoothing the sequence by averaging it produces a sequence which looks neat.

Thus \omega(n) has normal order \log\log n: almost all numbers n have about \log\log n ; this is the theorem of Hardy-Ramanujan. It’s even better than this : for almost all numbers, \omega(n) is within (\log\log n)^{1/2+\epsilon} of its model \log\log n. In fact the values of \frac{\omega(n)-\log\log n}{\sqrt{\log\log n}} follow a Gaussian distribution, as the Erdös-Kac theorem reveals.

In contrary the divisor function \tau doesn’t have any normal order, but it does have an average order which is quite good-looking, namely \log n (while \log\tau has a normal order which, surprisingly isn’t \log\log n but a bit a less, namely \log 2\log\log n, showing that a few exceptionally highly divisble numbers are able to make the average deviate substantially).

Now of course on the primes \tau and \omega collapse violently, being stuck at respectively 2 and 1. The question is whether just after such a shocking value, at n=p+1, one recovers more normal arithmetic indices, whether there are traces of the past catastrophe (or signs of the coming castrophe just before it).

Of course p+1 isn’t absolutely any number, for instance it’s surely even; but then is (p+1)/2 just any number? It must also be said that it has higher chances to be mutiple of 3 than a generic number, in fact one in two chance instead of one in three, because p+1\equiv 1\text{ or } 0\text{ mod }3 whith equal probability.

From the point of view of \omega

There the answer is: yes, perfectly. Indeed, Heini Halberstam established that the Erdös-Kac theorem holds when n is constrained to range in the shifted primes p+1. That is, \omega(p+1) lies within M\sqrt{\log\log p} of \log\log p for a positive proportion p(M) of primes, the density being again gaussian.

From the point of view of \tau

Not quite. For this consider the Titchmarsh divisor problem, consisting in estimating \sum_{p\leq x} \tau(p+1). If \tau(p+1) was replaceable by \log(p+1) as we suppose, then this sum would be asymptotic to \sum_{p\leq x}\log p\sim x by Mertens formula. It turns out that it is in fact asymptotic to \frac{\zeta(2)\zeta(3)}{\zeta(6)}x, the constant prefactor being well over 1,9, so that it can be said that p+1 has almost twice as many divisors as banal numbers of his size. Now remember it’s always even; now there are good heuristic reasons to believe that \sum_{n\leq x,n\equiv [0] 2}\tau(n)\equiv 2/3x\log x, so that in average \tau(2n) is close to 4/3\log n, but 1,9 is still much larger sensibly larger than 4/3. Here the higher chances of p+1 to be divisible by 3, 5 etc. in comparison to a banal number weigh enough to make the average deviate.

It would be interesting to determine whether \log \tau(p+1) has a normal order, in the same vein as Halberstam’s result.

Another criterion of banality: the probability of being squarefree

As we know, the density of squarefree numbers is 1/\zeta(2)=\prod_{q \text{ prime}}(1-q^{-2}). It is then reasonable to wonder if among numbers of the forme n=p+1, the proportion of square free is the same. It’s clear to see that it can’t quite be so: indeed, p+1 has one on two chance of being divisible by the square 4 (while a generic number has of course one on four chance), one on six chances of being divisible by 9, one on q(q-1) chance of being divisibly by the prime q. So one guesses that p+1 is a little less likely to be squarefree than any number; indeed, it’s been proven by Mirsky that the proportion of squarefree numbers among numbers of the form p+1 is \prod_q (1-1/(q(q-1)), to compare with \prod_{q \text{ prime}}(1-q^{-2}).

The property of being squarefree can be replaced by the property of being k-free for instance, the density among shifted primes being then \prod_q (1-1/(q^{k-1}(q-1)), to compare with \prod_{q \text{ prime}}(1-q^{-k}).

Appendix: average of the divisor function on odd and even numbers

We know that

\displaystyle{x\log x\sim\sum_{n\leq x}\tau(n)=\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)+\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)=S_0(x)+S_1(x)}

And let us suppose that \sum_{n\equiv 0\mod [2],n\leq x}\tau(n)\sim Ax\log x and \sum_{n\equiv 1\mod [2],n\leq x}\tau(n)\sim Bx\log x.

Now the sum on even numbers can be decomposed into a sum on numbers of 2-adic valuation 1,2,3… But numbers of 2-adic valuation k are numbers which are of the form 2^km with m odd, so

\displaystyle{\sum_{v_2(n)=k,n\leq x}\tau(n)=(k+1)\sum_{n\leq x/2^k,n\equiv 1[2]}\tau(n)\sim(k+1)Bx/2^k\log x} for fixed k. Pretending this asymptotic also holds when k is allowed to grow with x, we infer from S_0(x)=2S_1(x/2)+(3-2)S_1(x/4)+(4-3)S_1(x/8)\cdots that C=D+D/2+D/4\cdots=2D. Hence S_0(x)\sim 2x/3\log x and S_1(x)\sim x/3\log x.

It might be that this intuition has been proven more rigorously, I’d like to see where!

The genuine prime number theorem says that there are about N/\log N prime numbers between 1 and N. The prime number theorem in function fields is rather about the number a_n of irreducible monic polynomials of degree n in the polynomial ring A=\mathbb{F}_q[t] on a finite field \mathbb{F}_q. It is remarkable that the result is of the same shape q^n/n, which is N/\log N if you let N=q^n be the number of monic polynomials of degree n. But the proof, as often with the polynomial analogues of number theory, is much neater and even yields a nice exact formula for a_n

\displaystyle{q^n=\sum_{d\mid n}da_d\text{ and by Moebius inversion } a_n=\frac{1}{n}\sum_{d\mid n}\mu(d)q^{n/d}}

First proof

This one requires a good deal of knowledge of the theory finite fields.

We prove that \Phi=T^{q^n}-T is the product \Psi of all monic irreducible polynomials whose degree divide n, which immediately provides the above formula by comparison of the degrees. To prove this, notice that the roots of \Phi are all the elements of the extension \mathbb{F}_{q^n} of \mathbb{F}_q, so that \Phi, as an element of \mathbb{F}_{q^n}[t] can be factored as \prod_{\alpha\in \mathbb{F}_{q^n}}(T-\alpha). On the other hand, if P is an irreducible polynomial of degree m\mid n, it splits into linear factors in A/(P)\sim \mathbb{F}_{q^m}\leq \mathbb{F}_{q^n}. So P has a root \alpha\in \mathbb{F}_{q^n} and all its other roots are images of \alpha\in \mathbb{F}_{q^n} under the Frobenius element, more precisely,

P=(T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})

so that P\mid \Phi (at first sight in \mathbb{F}_{q^n}[t], but also in A because they are both in A – it is well known that the gcd is preserved under field extensions). Moreover, all monic irreducible P are coprimes, so that their product still divides \Phi. Finally, we need to prove the converse divisibility, either in A or in \mathbb{F}_{q^n}[t], which is equivalent. So let’s do it in the latter; take a factor T-\alpha with \alpha \in\mathbb{F}_{q^n}. Let m\mid n be the order of the Frobenius element on \alpha, so that \alpha^{q^m}=1 and no smaller integer has this property; then (T-\alpha)\mid (T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})\in A which is indeed an irreducible of degree m\mid n, hence a factor of \Psi. Given the factorization of \Phi and the fact that the factors T-\alpha are coprime, their product \Phi still divides \Psi

Second proof

This one proves exactly what we want and not more and uses only power series. We need the notation \vert f\vert=q^{\text{deg }f}.


\displaystyle{\zeta_A(s)=\prod_{P\in A\text{ monic irreducible}}(1-\vert P\vert ^{-s})^{-1}=\sum_{f\in A\text{ monic}}\vert f\vert^{-s}}

be the zeta function (for s of real part >1). It is easy to see that \zeta_A=(1-q^{1-s})^{-1} by looking at the sum expression, and \zeta_A=\prod_{d\geq 1}(1-q^{-ds})^{-a_d} by looking at the product expression. Putting u=q^{-s}, both sides become nice power series so that we get the equality


and doing classical operations (logarithmic derivation and expansion in power series followed by termwise identification in particular) on both sides yield the result.

This proof seems to use about zero input of algebra, so its success looks like a miracle. The only step where a (tiny) bit of knowledge seems to be used is the transformation of the sum into an Euler product – it needs the existence of unicity of the decomposition of a monic polynomial into product of irreducibles.

A kind of sieve method ?

It could be reasonable to hope that the sieve of Eratosthenes would work better in the realm of polynomial rings. We’ll try to adapt it here. Notice that in this case we have count primes up to a certain large degree, and not of exactly certain large degree (both quantities are far from equivalent in fact !).

Let’s define \pi(n) to be the number of monic irreducible (let’s say primes) polynomials of degree at most n, and \delta(m)=1_{m=1}. We also use Moebius’ function for monic polynomials, also defined in terms of the number of prime factors. The Moebius inversion formula \delta=\mu\star 1 still holds in the polynomial world. We also write

\displaystyle{\Phi=\prod_{P \text{mon. irred,deg} P\leq n/2}P}


\displaystyle{\pi(n)-\pi(n/2)+1=\sum_{f\text{ monic of deg}\leq d}\delta((f,\Phi))=\sum_{d\mid\Phi}q^{n-\mathrm{deg}d}\mu(d)1_{\mathrm{deg} d\leq n}}

and if we forget about 1_{\mathrm{deg} d\leq n} we obtain exactly

\displaystyle{q^n\prod_{P \text{mon. irred,deg} P\leq n/2}(1-\vert P\vert ^{-1})}

this looks quite nice, but the error term, just like in the integer case, can’t easily be bounded by anything better than O(2^{\pi(n/2)}) which we know will be vastly bigger than the main term. By the way this product can very easily be shown to be \ll 1/n but the exact asymptotic is not any easier than the PNT itself… So this sieve seems to have very little interest, given the convenience of the other methods.