It is an open and interesting problem to determine whether there exist two infinite sets B,C of integers such that B+C\subset\mathcal{P}, where \mathcal{P} is the set of primes. Let’s call this the prime sumset conjecture.

Observe that a set A contains an infinite sumset if, and only if, there exists an infinite set B such that \bigcap_{b\in B}(A-b) is infinite.

We will examine the connections between the Hardy-Littlewood prime-tuple conjecture (HL), the theorems on bounded gaps between primes and the prime-sumset conjecture.

HL and an easy reformulation

HL may easily be seen to be equivalent to the following statement: for any finite set H\subset\mathbb{Z} that does not cover all the classes modulo any prime, i.e. such that H\text{ mod } p\neq \mathbb{Z}/p\mathbb{Z} for every p, there exists an infinite set A\subset \mathbb{N} such that A+H\subset \mathcal{P}. The condition on H is obviously necessary for the conclusion to hold. However HL does not directly imply anything for infinite set of shifts H.

Conversely, if the primes contain a sumset A+B with A,B infinite, then for any finite H\subset B, HL is satisfied for the set H of shifts — but not necessarily for every admissible set.

A stronger form of the HL conjecture, sometimes attributed to Dickson, is the following: let k and (a_1,\ldots,a_k,b_1,\ldots,b_k) be an integer tuple such that (a_j,b_j)=1 for each j and for any prime p, there exists x such that \prod_j(a_jx+b_j)\neq 0\text{ mod } p. Then there exists arbitrarily large x such that a_jx+b_j is prime for every j\leq k.

HL implies prime-sumset

Granville proved that the Hardy-Littlewood conjecture implies the prime-sumset conjecture. Since his proof is more general and complicated, we explain here simply why this is so.

We construct successively integers a_n\in 2\mathbb{Z} and b_n\in 2\mathbb{Z}+1 such that the sets A_n=\{a_k : k\leq n\} and B_n=\{a_k : k\leq n\} satisfy A_n+B_n\subset \mathcal{P}.

Start with a_1=2 and b_1=1 for instance. We can then add a_2=4 and b_2=3. We note that a_3=10 and b_3=37 works.In general, the property we seek to maintain is that A_n \text{ mod } p does not cover the wholeset \mathbb{Z}/p\mathbb{Z} of congruence classes for any p (forp\leq n, otherwise there is no risk) and neither does B_n\text{ mod } p.

One systematic way to achieve this is to always seek a_n>n+1 in the congruence class a_{n-1}\text{ mod } \prod_{p\leq n}p and b_n in the class b_{n-1}\text{ mod } \prod_{p\leq n}, starting from a_1=2 and b_1=1. Assume we have already constructed A_n and B_n with this property.

We seek a_{n+1} in the form qx+a_n for some x\geq 1 and q=\prod_{p\leq n+1}p. Then we must ensure that qx+a_n+b_i is a prime for every i\leq n.

Note that a_n+b_i is a prime greater than n+1 for every i\leq n by the induction hypothesis, whereas q is a product of primes smaller than n+1, so (q,a_n+b_i)=1 for any i. Now for p\leq n the set \{a_n+b_i \text{ mod } p : i\leq n\} equals the set \{a_n+b_{i} : i\leq p-1\} since b_{i}\equiv b_{p-1}\text{ mod } p. Thus we can find x such that qx+a_n+B_n is included in the primes. Let a_{n+1}=qx+a_n. We then define b_{n+1} analogously.

Bounded gaps as an ersatz for HL

Although HL remains clearly out of reach, the breakthroughs of Zhang and then Maynard imply that HL holds for arbitrarily large sets H of shifts. Indeed, Maynard’s theorem implies not only that \liminf p_{n+1}-p_n<\infty where  p_n is the increasing sequence of prime numbers, but also that \liminf p_{n+k}-p_n=L_k<\infty for any k. That is, there exists L_k such that for infinitely many integers n, the interval [n,n+L_k] contains at least k+1 primes. As a result, by the pigeonhole principle, there exists a set H\subset [0,L_k] of cardinality k+1 such that A+H\subset \mathcal{P} for some infinite set A. This argument does not seem to extend to infinite sets H. One would need an infinite increasing sequence H_n of sets such that the set A_n of x such that x+H_n\subset \mathcal{P} is infinite.

But even that would not be enough. Indeed, consider a set such as

\displaystyle{A=\bigcup_k A_k}

where A_k=\{4^k+2^i,\, i\in[k]\}. This clearly has unbounded gaps, and for any n there exists an infinite set A_n such that A_n+\{2^i,\, i\in[n]\} is included in A.

One may see that this set does not contain an infinite sumset though. First note that for any integer b which is not a difference between two distinct powers of two, the set A\cap (A-b) is finite. Further, if b\geq 0 is the difference between two powers of two, it may be realised in a unique way as b=2^i-2^j. Therefore, A\cap (A-b)=\bigcup_{k\geq i} \{4^k+2^j\}. Now take B=\{0<b_1<b_2<\cdots\}. Suppose b_1=2^i-2^j. We may suppose that each b_r is a difference between two powers of two, in fact b_r=2^{i_r}-2^j, otherwise \bigcap_{b\in B}(A-b) is empty. We have A\cap (A-b_1)\cap \cdots\cap (A-b_r)=2^j+\{4^k\,:\,k\geq i_r\}. Therefore \bigcap_{b\in B}(A-b) is empty.

This implies that A does not contain an infinite sumset.

A simple proof that any thick set contains an infinite sumset

It has been proven that any set of positive Banach density contains an infinite sumset. The proof, even in the simplified version of Host, is far from elementary. It may have been noted before, but I found a simple proof in the case of sets of density 1, also called thick sets. Note that such a set A necessarily contains arbitrarily long intervals, sets of the form [d_k,d_k+k] for infinitely many (say all) k. Now we construct inductively b_1<b_2<\ldots and c_1<c_2<\ldots such that B+C is inside A. First, set a_1=d_1 and a_{i+1}=d_{a_i}. Then set b_i=a_{2i} and c_{i}=a_{2i+1}. Then note that [a_i,a_i+a_j] is inside A whenever i>j. As a result, we infer that b_i+c_j is in A for all i,j.

Let A\subset\mathbb{F}_3^n have at least \alpha 3^n elements, for some \alpha >0 (we say that A has density at least \alpha), and very large n.
We know, by Meshulam’s theorem (Roth’s theorem in vector spaces), and even more strongly by Ellenberg-Gijswijt, who used the Croot-Lev-Pach method, that A must contain a 3-term arithmetic progression, in other words an affine line.

But does it have to contain planes or even affine subspaces of larger dimensions?

First we examine the case of a random set A, thus each x\in\mathbb{F}_3^n is taken in A with probability \alpha independently of each other.
This implies that any given subspace has probability
\alpha^{3^k} to be included in A.
Now how many affine subspaces of dimension k are there in a space of dimension n? This is 3^{n-k}\binom{n}{k}_3 where \binom{n}{k}_3 is the number of linear subspaces of dimension k, and is easily seen to equal

\displaystyle{ \binom{n}{k}_3=\frac{(3^n-1)\times\cdots \times(3^{n-k+1}-1)}{(3^k-1)\times\cdots\times (3-1)}.}

One gets the obvious bound

\displaystyle{ \binom{n}{k}_3\leq 3^{kn}.}

Thus the expected number of subspaces of dimension k in A is at most \alpha^{3^k}3^{kn}.
This is of the order of magnitude of a constant
when k is of the order of \log (n/\alpha^{-1}).
So when k\asymp\log (n/\alpha^{-1}), there may be no single subspace of dimension k.
However, using
the lower bound

\displaystyle{ \binom{n}{k}_3\geq \frac{(3^{n-k+1}-1)^k}{3^{k^2}}}

may be used to show similarly that when

\displaystyle{ k\ll \log (n/\alpha^{-1}),}

the set A is likely (in fact, almost sure) to contain subspaces of dimension k.

Now we prove that if a set has positive density and n is large enough, then A must contain a subspace of dimension \Omega(\log n/\log\log n).

The key is basically to use Varnavides averaging argument along with the Ellenberg-Gijswijt bound.
Let n(\alpha, k) be the minimum n (if it exists) such that any set A\in\mathbb{F}_p^n of density \alpha must contain a subspace of dimension k. The Ellenberg-Gijswijt bound amounts to n(\alpha,1)\leq c\log \alpha^{-1}; indeed, \alpha just has to be at least C(2.756/3)^n to ensure a line. Then Varnavides argument is an averaging trick that says that a dense set must contain many lines. So many that it must contain many parallel lines. So many that the set of starting points of these lines must in turn contain a line. In which case we obtain a plane. And so on. This was used by Brown and Buhler to derive the recursive relationship

\displaystyle{ n(\frac{1}{r},k+1)\leq n_0+n(\frac{1}{er^2},1),}

where n_0=n(\frac{1}{r+1},k) and e is the number of linear spaces of dimension k in a space of dimension n_0. We can bound e by 3^{n_0k}, which yields n(\frac{1}{er^2},1)\ll n_0 k+\log r.
And so

\displaystyle{ n(\frac{1}{r},k+1)\ll kn(\frac{1}{r+1},k)+\log r.}

Inducting, one obtains

\displaystyle{ n=n(\frac{1}{r},k)\leq (c')^k k k!\log (r+k).}

Thus \log n\ll k\log k. In other words k\gg \frac{\log n}{\log\log n}, which is exactly the claimed result.

It is well-known that a real number is a rational if, and only, if its decimal expansion is periodic (see Wikipedia). This result is a finiteness result, as it says the decimal expansion is, in a way, not really infinite, as it consists in the repeating infinitely a finite block.

A similar finiteness result exists for Laurent series \alpha=\sum_{i=1}^{\infty}\alpha_i T^{-i} \in K((1/T)), which are the analogues of reals modulo 1. Namely, associate to \alpha the infinite matrix (\alpha_{(i+j)-1})_{i,j\in\mathbb{N}^2}. Then Hankel’s theorem states that \alpha is rational if and only if the matrix is of finite rank.

We show here a quantitative version of this statement. Let

\displaystyle{M_{\alpha,n}=\begin{pmatrix} \alpha_{1} & \cdots & \alpha_{n} & \\ \alpha_{2} & \cdots & \alpha_{n+1} & \\ \vdots & \vdots & \vdots & \\ \alpha_{n} & \cdots & \alpha_{2n-1} & \end{pmatrix}.}

The rank of this matrix is related to Diophantine properties of \alpha.
To see this, we introduce for \beta=\sum_{i=-\infty}^{m}\beta_it^i\in\mathbb{F}_p((\frac{1}{t})) the fractional part \{\beta\}= \sum_{i=-\infty}^{-1}\beta_it^i and the norm \lVert{\beta}\rVert=\lvert\{\beta\}\rvert=q^k, where k\leq -1 is the largest i\leq -1 such that \beta_i\neq 0. Then we have the following result.

If the rank of M_{\alpha,n} is at most r, then there exist a decomposition r=i+j and a monic polynomial P of degree i such that \lVert P\alpha\rVert<q^{-2n+j}. The converse holds too.

The classical theory of Hankel matrices (see Chapter X, Paragraph 10 of Gantmacher’s book) tells us that if the rank of the matrix is r, then there exist integers i and h such that i+h=r and the i first rows are linearly independent, each of the following n-r rows is a linear combination of them,
and the minor formed of the first i and last h rows and columns is nonzero. Let L_1,\ldots,L_n be the rows of M_{\alpha,n}. Write L_m=(L_m',a_m) where a_m is the last coefficient of L_m and L'_m the row of the first n-1 coefficients. By the above, we have a relation
L_{i+1}=\sum_{m=1}^i c_mL_m. In fact, we shall show that for any j=0,\ldots,n-r-1, we have

L_{i+1+j}=\sum_{m=1}^i c_m L_{m+j}.

This equation holds for j=0. So we argue by induction and assume it holds for any j'\leq j for some j<n-r-1 and prove it for j+1. To start with, the displayed equation implies that
\displaystyle{L'_{i+1+j+1}=\sum_{m=1}^i c_mL'_{m+j+1}.}
Applying the induction hypothesis iteratively, we find coefficients c_m^{(k)} for m=1,\ldots,i and k\leq j+1 such that

\displaystyle{L_{i+1+k}=\sum_{m=1}^i c_m^{(k)}L_{m}}

\displaystyle{ L'_{i+1+j+1}=\sum_{m=1}^i c_m^{(j+1)}L'_{m}.}
These coefficients satisfy the initial condition c_m^{(0)}=c_m and the recurrence relations c_m^{(k+1)}=c_{m-1}^{(k)}+c_ic_m^{(k)} for m>1 and c_1^{(k)}=c_h^kc_1.

On the other hand, we know that there exist coefficients d_1,\ldots,d_i
such that
\displaystyle{L_{i+1+j+1}=\sum_{m=1}^i d_mL_{m}.}
Comparing the last equation to the one involving L', and using the linear independence of the first i lines, we find that d_m=c_m^{(j+1)}.

Thus a_{i+1+j+1}=\sum_{m=1}^i c_m^{(j+1)}a_m. But \sum_{m=1}^i c_ma_{m+j+1}=\sum_{m=1}^i c_m^{(j+1)}a_m by definition of the coefficients c_m^{(k)}. We infer that a_{i+1+j+1}=\sum_{m=1}^i c_ma_{m+j+1},
which conclude the inductive argument.

To see the connexion with Diophantine properties of \alpha, notice that L_i is the row of the first n coefficients of \{t^{i-1}\alpha\}. Thus the validity of  the identity for L' for all j=0,\ldots,n-r-1 implies that \{t^i\alpha\}=\{\sum_{m=1}^ic_mt^{m-1}\alpha\}+\beta
where \beta\in\mathbb{T} satisfies \lvert\beta\rvert<q^{-2n+h}.
That is, using the polynomial P=t^i+\sum_{m=1}^ic_mt^{m-1}, we find that \lVert P\alpha\rVert<q^{-2n+h} which concludes the proof of the direct statement. The converse is straightforward.

Let V=\mathbb{F}_p^n be a vector space over a finite prime field. One knows then which subsets A\subset V satisfy A+A=A, the sets that are stable under addition: these are the kernels of linear maps, that is, each such set A is the zero set of a bunch of linear forms.

Now we consider a two-coordinates analogous problem, that is, we consider a set A\subset V\times V and define the sets

\displaystyle{P\stackrel{V}{+}P= \{(x,y_1+ y_2)\mid (x,y_1),(x,y_2)\in P\}}

and P\stackrel{H}{+}P, V and H meaning vertical and horizontal. What are the sets that satisfy both P\stackrel{V}{+}P=P and P\stackrel{H}{+}P=P? Call such a set a transverse set. Natural exemples are cartesian products of vector spaces (which we shall call rectangles) as well as zero sets of bilinear forms. More generally, a set of the form \displaystyle{\{(x,y)\in W_1\times W_2\mid Q_1(x,y)=\cdots=Q_r(x,y)=0\}} for some subspaces W_2,W_2\leq V and some bilinear forms Q_i is horizontally and vertically closed. Call such a set a bilinear set, and call its codimensions the codimensions of the subspaces and the number of bilinear forms required (the dimension of the space of bilinear forms vanishing on it). What is the relationship between transverse sets and bilinear sets? Are all transverse sets essentially bilinear sets?

For a transverse set P\subset V\times V and x\in V, let P_x be the fiber above x. Thus P=\bigcup_x \{x\}\times P_x and P_x is a vector space. Actually P_x depends only the projective class [x]\in P(V). Moreover, the stability under horizontal operations is equivalent to the property that if [z] is on the projective line spanned by x and y, we have P_z\supset P_x\cap P_y.

The codimension 1 case

Suppose P_0=V and each subspace P_x has codimension at most 1. Write P_x=\xi(x)^\perp with \xi(x)\in P(V) some vector uniquely defined up to homothety. It is easy to see that the set of x such that P_x=V is a vector subspace which we call W. Furthermore, if x-y\in W, we have \xi_x=\xi_y, so that \xi descends to an injective map on V/W. Thus it is enough to study the case where P_x has codimension exactly one unless x=0. In other words, we have a map \xi : P(V)\rightarrow P(V) satisfying the property that if z is on the projective line spanned by x and y, then \xi_{z} lies on the projective line spanned by \xi_x and \xi_y. Thus \xi maps projective lines into lines. Actually it can be quickly checked that such a map is either bijective or constant on each single projective line.

If we suppose it is injective, too, then by the fundamental theorem of projective geometry, we conclude that it is a projective map. Thus it is induced by a linear map \xi : V\rightarrow V. Hence writing Q(x,y)=\xi(x)\cdot y, we have shown that P is the zero set of the bilinear form Q.

In general \xi is far from being injective. It can for instance be constant on P(V) (equal to H^\perp for some hyperplane H). In this case P=\{0\}\times V\cup V\times H. This is neither a rectangle, nor the zero form of a quadratic form, because if it was, \xi could be taken linear, and because \xi_x=0\leftrightarrow x=0, it should be injective. In fact, if it is the intersection of zero sets of a family of bilinear forms, there needs to be n bilinear forms in this family, such as the forms x_iy_n=0 if (wlog) H is the hyperplane y_n=0. However P obviously contains a very large rectangle.

These are actually the only cases: such a map \xi is either injective or constant. Indeed, if P(V) is just a line, it’s obvious, so suppose \dim V\geq 3. Suppose that \xi maps projective lines into lines (in the sense above) but is neither injective nor constant. That is, there exist two distinct points x,y such that \xi_x=\xi_y, and a third point z satisfying \xi_z\neq \xi_x. This implies that x,y,z span a projective plane. Take a point w\neq y,z on the line spanned by y,z. Because \xi is a bijection on both lines (yz),(xz), you can find w'\neq x,z on (xz) such that \xi_w=\xi_{w'}\neq \xi_x. Now consider the intersection u=(ww')\cap (xy). Then  you have \xi_u=\xi_x=\xi_y\neq \xi_w, so that on the line (ww'), the map \xi is neither constant nor injective, which is absurd.

Arbitrary codimension

A more general potential counterexample, communicated to me by Ben Green, is a set of the form P=\bigcup_{i=0}^r V_i\times W_i where V_i, W_i are sequences of increasing (resp. decreasing) subspaces.  Thus for x\in V, we have P_x=W_{i(x)} where i(x)=\min\{i\mid x\in V_i\}. Again you can make it a bilinear set, but you need atrociously many bilinear forms.

However it contains a large rectangle. In fact, \dim V_i\leq n-(r-i) while \dim W_i=n-i so that the density of P is at most (r+1)p^{-r}\ll p^{-r/2}. Thus r\ll \log\delta^{-1}. Finally by the pigeonhole principle, one of the cartesian products V_i\times W_i have density at least \delta/\log\delta^{-1}.

In general, we have a vector space P_x=A_x^\perp. Suppose for instance each A_x has codimension 2. If we manage to find maps x\mapsto \xi_1(x),\xi_2(x) such that \xi_i(x+y)\in\text{span} (\xi_i(x),\xi_i(y)) for each i,x,y, we are in good shape. Though it is not clear how to show that such a consistent choice of bases can be made…

The conjecture

If a A\subset V\times V has cardinality at least \delta \lvert V\rvert ^2, it contains a bilinear set of (linear and bilinear) codimensions c(\delta)=O(\log^{O(1)}\delta^{-1}).


Given a compact abelian group G, endowed with a Haar probability measure \mu, let \mathcal{F}_a be the set of (Borel) measurable functions G\rightarrow [0,1] satisfying \int_G fd\mu \geq \alpha. The average of f is called its density.

Let \Psi be system of linear forms, i.e \Psi=(\psi_1,\ldots,\psi_t): G^d\rightarrow G^t and


where c_{i,j}\in\mathbb{Z} for any i\in[t] and j\in[d]. For a measurable function f : G\rightarrow [0,1], one may define the \Psi-count of f as

\displaystyle{T_\Psi(f)=\int_{G^d}\prod_i f(\psi_i(x))d\mu(x).}

Here by a slight abuse of notation, the measure \mu on G^d denotes in fact the product measure \mu^{\otimes d} of the Haar measure on G.



When a system \Psi is fixed, we may omit it from the notation m_\Psi. Given a sequence G_n of finite groups of cardinality tending to infinity, it is natural to consider the sequence m(\alpha,G_n). When the infimum is taken only over \{0,1\}-valued measurable functions of density at least \alpha, this is the minimum number of APs a set A\subset G_n of density at least \alpha can have. But these infimum are actually the same (basically by convexity).

The case G_n=\mathbb{Z}/n\mathbb{Z} was introduced by Croot. He proved that the minimal density m(\alpha,\mathbb{Z}_N) of three-term arithmetic progressions converged as N\rightarrow\infty through the primes. Sisask, in his PhD thesis, was able to express this limit as an integral over the corresponding limit object, namely the circle \mathbb{T}=\mathbb{R}/\mathbb{Z}, thus

\displaystyle{\lim_{\substack{N\rightarrow +\infty\\ N\text{ prime}}} m(\alpha,\mathbb{Z}_N)=m(\alpha,\mathbb{T}).}

As customary, the corresponding problem in finite field models is cleaner. Thus we set G_n=\mathbb{F}_p^n. Whereas Croot noticed that the sequence m(\alpha,\mathbb{Z}_N) was in some sense approximately decreasing, it is easy to see that for any system \Psi, the sequence m_\Psi(\alpha,G_n) is genuinely non-increasing. This is because \mathbb{F}_p^n embeds naturally in \mathbb{F}_p^{n+1}; thus if f is a map from \mathbb{F}_p^n to [0,1], it can be extended into \tilde{f} : \mathbb{F}_p^{n+1}\rightarrow [0,1] by putting \tilde{f}(x_1,\ldots,x_{n+1})=f(x_1,\ldots,x_n).

Now \mathbb{E} \tilde{f}=\mathbb{E}f, and T_\Psi(f)=T_\Psi(f). This proves that m_\Psi(\alpha,G_n)\geq m_\Psi(\alpha,G_{n+1}), which implies that this sequence has a limit.

Let \Psi be now fixed. We want to find an appealing expression for the limit of the sequence m(\alpha,G_n). It is thus tempting to look for a natural topological compact group that would play the role of limit object of the sequence G_n, just as \mathbb{T} played the role of limit object for the sequence \mathbb{Z}/N\mathbb{Z}.

The analogy \mathbb{Z}\leftrightarrow \mathbb{F}_p[t]=\cup \mathbb{F}_p^n=\mathbb{F}_p^{\infty} leads one to introducing G_\mathbb{N}=\mathbb{F}_p^\mathbb{N} and G_\infty=\mathbb{F}_p^{\infty}. The group G_\mathbb{N} equipped with the product topology is a compact group by Tychonoff’s theorem, and G_\infty equipped with the discrete topology is locally compact; it is the Pontryagin dual of G_\mathbb{N}. It is now reasonable to ask whether

\displaystyle{\lim m(\alpha,G_n)=m(\alpha, G_\mathbb{N}).}

One easily notices that m(\alpha,G_n)\geq m(\alpha, G_\mathbb{N}) for any n\in\mathbb{N} by extending, as above, a function f : \mathbb{F}_p^n \rightarrow [0,1] into a function \tilde{f} on \mathbb{F}_p^\mathbb{N} with the same density and the same \Psi-count. Indeed, one can define


and then \tilde{f} is a measurable function which has same density and same \Psi-count.

We prove the other inequality. For this we prove two intermediate propositions. We write from henceforth G=G_\mathbb{N}.

Proposition 1

If f_n is a sequence of [0,1]-valued functions on G that converges in L^2(G) to a [0,1]-valued function f, then T(f_n)\rightarrow T(f).


Let F_n(x)=\prod_{i\in[t]} f_n(\psi_i(x))-\prod_{i\in[t]} f(\psi_i(x)). We show that F_n converges in L^1(G^d) to 0, which will conclude. In fact we prove a stronger convergence, a convergence in L^2. We show that \lVert F_n-F\rVert_{2}\leq t \lVert f_n-f\rVert_2. For that we simply remark that


Now, any non-trivial linear map \theta : G^d\rightarrow G, i.e. any map of the form

\displaystyle{\theta(x_1,\ldots,x_d)=\sum_{i=1}^d c_ix_i.}

with (c_1,\ldots,c_d)\in \mathbb{F}_p^d\setminus\{0\}, preserves the Haar measure, which implies that

\displaystyle{\lVert f_n\circ\psi_j-f\circ\psi_j\rVert_{L^2(G^d)}=\lVert f_n-f\rVert_{L^2(G)}.}

Using this, the triangle equality and the fact that the functions are 1-bounded, we conclude.

Proposition 2

Let \mathcal{F}_{a,n} be the set of functions \tilde{f} for f : G_n\rightarrow [0,1] of density at least \alpha. Then \mathcal{F}_{a,n} is dense in (\mathcal{F}_a, \lVert \cdot\rVert_2).


Let f\in\mathcal{F}_aWe construct a sequence of functions f_n\in\mathcal{F}_{a,n} that converge to f. We use duality and Fourier transform. We have \hat{G_n}=G_n and \hat{G_\mathbb{N}}=G_\infty=\cup G_n.

For \chi\in G_\infty, write


Then write


This defines a function f_n : G_n \rightarrow \mathbb{C}, which obviously satisfies


The same holds for its extension \tilde{f_n}Notice that \tilde{f_n} is also the conditional expectation with respect to the \sigma-algebra \mathcal{P}(G_n)More precisely,

\displaystyle{f_n(x_1,\ldots,x_n)=\mathbb{E}_{y\in G}(f(y)\mid (y_1,\ldots,y_n)=(x_1,\ldots,x_n))}

as both equal probacondi.

This implies in particular that f_n, and thus \tilde{f_n}has its values in [0,1].

To check that \mathbb{E} f_n \geq \alpha, we use the conditional expectation interpretation

\displaystyle{\mathbb{E}[f_n]=\mathbb{E}[\mathbb{E}[f\mid G_n]]=\mathbb{E}[f]\geq \alpha.}

Now we check the convergence. It follows from basic harmonic analysis: f being in L^2(G), its Fourier series f_n converges to f in L^2(G). This concludes the proof of Proposition 2.

Thus even for 4-APs, this shows that


exists and equals

\displaystyle{\inf_{f:\mathbb{F}_p^\mathbb{N}\rightarrow[0,1],\int f\geq \alpha}\int f(x)f(x+y)f(x+2y)f(x+3y)dxdy.}

This is in contrast to a paper of Candela-Szegedy, for the same pattern \Psi (4-APs) but with G_n=\mathbb{Z}/n\mathbb{Z} where they obtain as a limit a much more complicated infimum, an infimum over functions f on \mathbb{T}^2 of the \Psi'-count, where \Psi'\neq\Psi is another pattern. The part that holds in the setting of \mathbb{F}_p^n and not in the setting of \mathbb{Z}_p is Proposition 2. The truncated Fourier series of f : \mathbb{T}\rightarrow[0,1], which is the natural L^2 approximation, does not give a function on cyclic groups \mathbb{Z}_p with the same density and AP-count as f.

We remark that the paper of Hatami-Hatami-Hirst implies that the sequence of functions f_n admits a limit object g which is defined on a finite cartesian power of G_\mathbb{N}.

In fact, if \Psi is the k-APs, pattern of complexity k-2, and p\geq k, the limit object g is defined on G_\mathbb{N}^{k-2}. In particular, if k=3, this means that the infimum above is attained, i.e.

\lim_{n\rightarrow \infty}m(\alpha, G_n)=T(g)

for some g : \mathbb{F}_p^\mathbb{N}\rightarrow [0,1] measurable. When gnzn, it is still not known whether such a result (i.e. the existence of a limit object g : \mathbb{T}\rightarrow [0,1]) holds. However, Szegedy proves the existence of a limit object g : X\rightarrow [0,1] on a non-explicit group X.


A recent breakthrough due to Croot, Lev and Pach, recycled by Ellenberg-Gijswijt and streamlined by Tao on this blog post, revolutionised the landscape of Roth’s type problems in vector spaces over finite fields. Indeed, Ellenberg-Gijswijt obtained that a set A\subset \mathbb{F}_3^n free of triple satisfying x_1+x_2+x_3=0 can have at most 2.756^n elements out of the 3^n out there. Here we show the same method can deal with sets free of solutions to various other equations in any \mathbb{F}_p^n (given that if q=p^k, we have \mathbb{F}_q^n\sim \mathbb{F}_p^{kn} as a vector space, there’s no point in considering general \mathbb{F}_q^n) and we make explicit the dependence on p of the resulting bound. We refer in the sequel mostly to Tao’s symmetric proof instead of the original articles. One great merit of this exposition is to reveal, thanks to the extension of the notion of rank from matrices (seen as functions of two variables) to functions of more variables, how the scope of the method can be extended.

Other linear equations

In fact the bound found for the sets without 3-term arithmetic progressions is also valid
for a set A\subset \mathbb{F}_p^n having no non-trivial solutions to
where \phi_1,\phi_2,\phi_3 are three linear automorphisms of \mathbb{F}_p^n summing to 0. Indeed,
we can then still write
\displaystyle{\delta_{0^n}(\phi_1(x_1)+\phi_2(x_2)+\phi_3(x_3))=\sum_{a\in A}\delta_a(x_1)\delta_a(x_2)\delta_a(x_3).}
Even more generally, for any integer k\geq 3, if \phi_1,\ldots,\phi_k are k automorphisms of \mathbb{F}_p^n which sum to 0, and if A\subset \mathbb{F}_p^n does not have any non-trivial solution to the equation \sum_{i\in[k]}\phi_i(x_i)=0, we can still write

\displaystyle{ \delta_{0^n}(\sum_{i\in[k]}\phi_i(x_i))=\sum_{a\in A}\prod_{i\in[k]}\delta_a(x_i). }

The right-hand side is still a function on A^k of rank \lvert A\rvert. Meanwhile, the left-hand side may still be expanded using the formula \delta_{0}(x)=1-x^{p-1} valid for all x\in\mathbb{F}_p, which implies that the LHS equals

\displaystyle{ \prod_{j\in[n]}(1-(\sum_{i\in[k]}\phi_{i}^j(x_i))^{p-1}) }

where \phi_i^j means the j-th coordinate of \phi_i; it is
thus a linear polynomial. This implies that the whole expression is a polynomial of degree (p-1)n. The treatment shown by Tao on his blog to bound the rank of this expression can thus be applied here too.

Roth’s theorem is simply the case where k=3 and \phi_1=\phi_2=\phi_3=\text{Id} for p=3, or more generally \phi_1=\phi_2=\text{Id} and \phi_3=-2\text{Id}.

The more variables, the smaller the bound becomes. For instance, a set A\subset \mathbb{F}_5^n without any solution to the equation x+3y=2z+2w, in particular without any 3-APs, has size at most 3.83^n, less than the bound for sets without 3-APS which is 4.46^n.

Even non-linear equations

Upon scrutinizing the polynomial method at work in Ellenberg-Gijswijt, one can be struck by the fact that the translation-invariance of the equation x+y+z=0 is not even used. In Meshulam’s theorem, as in all form of density increment strategy, we crucially use the fact that if (x,y,z) is a solution and if c\in\mathbb{F}_p^n is any constant, then (x+c,y+c,z+c) is a solution too. What is used in the polynomial strategy is that any constant tuple (c,\ldots,c) is a solution, which is vastly weaker (though it is synonymous for linear patterns). Thus if \phi_1,\phi_2,\ldots,\phi_k are any bounded-degree polynomial maps \mathbb{F}_p^n \longrightarrow\mathbb{F}_p^m (i.e. maps whose coordinates are polynomials of degree at most d) summing to 0, then the method should, in principle, spit out a bound for the maximal size of a set A\subset \mathbb{F}_p^n free of solutions to \sum\phi_i(x_i)=0, depending on the degree. The only thing that changes in Tao’s write-up is that the polynomial


 whose rank we try to bound is now of degree up to (p-1)md. Thus the parameter of relevance and which determines whether the method will succeed or not is m/n \cdot d/k. It has to be significantly less than 1/2. Otherwise, the method requiring a good bound (exponential if possible) on

\mathbb{P}(X_1+\cdots+X_n\leq (p-1)m d/k)

where X_1,\ldots, X_n are independent, identically distributed variable on \{0,\ldots,p-1\}, we are hopeless.

Thus, considering m=2n,d=2, we need k=9 variables.

Equations in a polynomial ring

As advocated by Tom Bloom (for instance in this paper), it might be of interest to ask for analogues of theorems in polynomial ring. However, a relation of the form f_1+f_2=2f_3 among three polynomials over \mathbb{F}_q of degree less than n is simply a relation among three vectors, the vectors of their coefficients. But we can form much more general equations, such as

\displaystyle{\sum_{i=1}^k\alpha_i f_i=0}

for some \alpha_1,\ldots,\alpha_k \in \mathbb{F}_p[t]. We require again \sum_i\alpha_i=0. Now the map f\mapsto \alpha f on the polynomials of degree less than n translates, in term of the coefficients of a polynomial, into a linear map \phi from \mathbb{F}_p^n to \mathbb{F}_p^{n+d} where d= \text{deg }\alpha.
Let d be the maximum of the degrees of the \alpha_i and let us introduce the maps \phi_i :\mathbb{F}_p^n\mapsto \mathbb{F}_p^{n+d} corresponding to asking for the size of A\subset \mathbb{F}_p[t] without any solution to the equation above amounts to asking for the size of a set A\subset \mathbb{F}_p^n without any solution to the equation \sum_i\phi_i(x_i)=0, a situation we have already analysed.
Even better, translating the map f\mapsto f^2 into a a polynomial map \mathbb{F}_p^n\mapsto \mathbb{F}_p^{2n-1}, namely
we’re back to the previous paragraph. Here m=2n-1,d=2, so that to ensure md/nk<1/2, we need k\geq 9. We then get a bound \lvert A\rvert \ll (0.99 p)^n. This number of variables is terrible. In fact, in the integers, Browning and Prendiville showed that 5 variables suffice for such an equation, that is, they got a bound for the size of a set A\subset [N] without solution to the equation \sum_{i=1}^4 x_i^2=4x_5^2. It is not satisfactory to end up needing more variables in the function field setting than in the integer case. The polynomial method does not seem good at tackling polynomial equations.

Dependence on p

Let’s return to the case of Roth’s theorem.
Following Tao, one gets a bound of the form \lvert A \rvert\ll n^{p-1}\exp(nc_p)
where c_p is the maximum of

\displaystyle{ h(\alpha_0,\ldots,\alpha_{p-1})=-\sum_{k=0}^{p-1}\alpha_k\log\alpha_k}

under the constraints

\displaystyle{ \alpha_i\geq 0, \sum \alpha_i=1,\sum i\alpha_i\leq (p-1)/3}

It could be interesting to know how c_p evolves with p.
Using the help of a computer to compute c_p for p upto 10^8, I got the feeling that \exp(c_p)/p\approx 0.84+0.30/p. This would mean that a subset A\subset \mathbb{F}_p^n free of 3-term arithmetic progressions necessarily satisfies \lvert A \rvert\ll (0.85p)^n.

We now proceed to show that the empirically conjectured behaviour of c_p is indeed the true one.

The constrained extrema theorem implies that the maximum of h is attained on a point where

\displaystyle{\overrightarrow{\mathrm{grad}} h=-(\alpha_0\log\alpha_0+1,\ldots,\alpha_{p-1}\log\alpha_{p-1}+1)}

belongs to \text{Vect}((1,\ldots,1),(0,1,\ldots,p-1)). This condition amounts to the existence of a,b\in\mathbb{R} so that \alpha_i\log\alpha_i=\exp(1-a-ib) for all indices i. So we have only two degrees of freedom for the tuple (\alpha_0,\ldots,\alpha_{p-1}. And because we have two equations

\displaystyle{ \sum \alpha_i=1,\sum i\alpha_i= (p-1)/3}

there should be a unique such tuple.
Putting q=\exp(-b), we transform the first constraint in

\displaystyle{ \exp(1-a)\sum_{i=0}^{p-1} q^i=1 }

thus the second one amounts to

\displaystyle{ \sum_{i=0}^{p-1}iq^i=\frac{p-1}{3}\sum q^i. }

Here the LHS equals

\displaystyle{ \frac{((p-1)q-p)q^p+q}{(q-1)^2}. }

Thus the equation to solve is

\displaystyle{ ((p-1)q-p)q^p+q=\frac{(p-1)(q-1)}{3}(q^p-1), }

in other words

\displaystyle{ 2(p-1)q^{p+1}-(2p+1)q^p+3q+(p-1)q=(p-1). }

It clearly imposes that as p\rightarrow\infty, we must have q\rightarrow 1, for if it remains bounded away from 1, the left-hand side is asymptotic to (p-1)q, which is not asymptotic to p-1, the right-hand side.

We now rearrange the equation as

\displaystyle{ p(q+2q^{p+1}-2q^p)-2q^{p+1}-q^p+2q=p-1 }

that is

\displaystyle{ p(q-1)(1+2q^p)=2q^{p+1}+q^p-2q-1=(q^p-1)(2q+1). }

Let us introduce \epsilon=1-q\rightarrow 0 and determine the behaviour of p\epsilon, or equivalently of q^p. One can prove by contradiction that p\epsilon cannot tend to 0. Indeed, supposing it did, one could write

\displaystyle{ q^p=\exp(p\log(1-\epsilon))=\exp(-p\epsilon)\exp(O(p\epsilon^2)) =1-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2. }


\displaystyle{ -p\epsilon(3-2p\epsilon+(p\epsilon)^2+o(p\epsilon)^2)=(-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2)(3-2\epsilon) }

and comparing the term in (p\epsilon)^2 we see an absurdity.
It can’t either tend to \infty. If it has a limit point \alpha, it has to be a solution to

\displaystyle{ -\alpha(1+2\exp(-\alpha))=3(\exp(-\alpha)-1). }


The computer declares that the solution to this equations equals 2.149125799907062 (which is very close, fortunately enough, to values observed for p(1-q) for large p). Thus q^p tends to \exp(-\alpha)\approx 0.11.

Injecting the expression for the \alpha_i in terms of a,b,  the optimum is

\sum_k(a+kb-1)\exp(1-a-kb) = \exp(1-a)\left((a-1)\sum q^k+b\sum kq^k\right)

which we can equate, remembering earlier formulae, to
So let us derive an asymptotic expression for a and b. We quickly obtain a=\log p+\log\frac{1-\exp(-\alpha)}{\alpha}+1+o(1), b=\alpha/p+o(1/p).
So the maximum we are after equals

\displaystyle{c_p= \log p+\log\frac{1-\exp(-\alpha)}{\alpha}+\alpha/3+o(1) }

This implies that

\displaystyle{\frac{\exp(c_p)}{p}= \frac{1-\exp(-\alpha)}{\alpha}\exp(\alpha/3). }

Injecting the value we found for \alpha, the computer spits out 0.84, as guessed.


Many properties of the set of the primes are inherited by any dense subset thereof, such as the existence of solutions to various linear equations. But there’s one property, recently proved, namely the existence of infinitely many gaps p_{n+1}-p_n\leq C bounded by a constant, which is not inherited by dense subsets of the primes. For instance, it is possible to remove basically one half of the primes and be left with a subset of primes \{p'_n\mid n\in\mathbb{N}\} such that p'_{n+1}-p'_n\ll \log p_n.

So instead one could ask if when coloring the primes in finitely many colours, one is guaranteed to find a colour class containing infinitely many pairs of primes whose difference is bounded by a common constant. This does not follow immediately from pigeonhole principle, which only says that one colour class contains infinitely many primes.

However, this follows easily if we recall that the work of Maynard does not only give infinitely many bounded gaps, that is \liminf p_{n+1}-p_n<+\infty, but also for any m that \liminf p_{n+m}-p_n < +\infty. Let H(m) be this liminf.

In other words there are infinitely many (disjoint) intervals of size I_n=[x_n,x_n+H(m)] which contains at least m+1 primes. So if you colour the primes in m colours, and you consider these intervals, you find that each one must contains at least two primes of the same colour. By pigeonhole principle again, there must be a colour for which this occurs infinitely often. Whence the existence in this colour class of infinitely gaps of size at most H(m).

The squarefree numbers form a set which is well-known to have density 6/\pi^2 in the integers, in other words their characteristic function \mu^2 satisfy \sum_{n\leq x}\mu^2(n)=6/\pi^2x+o(x). In this set all your dream theorems about the count of linear patterns are true and easy.

Hardy-Littlewood squarefree tuples

The Hardy-Littlewood conjecture consists, in its strongest form, in an asymptotic, for every \mathcal{H}=\{h_1,\ldots,h_k\}, of the form

\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all prime}\}\rvert=(c(\mathcal{H})+o(1))\frac{x}{\log^kx}}

where c(\mathcal{H}) is a constant possibly equal to 0. It is quite far from being proven, although interesting steps towards it have been made in the last ten years, by Goldston-Pintz-Yildirim and Maynard.

In the squarefree numbers, the analogous problem is rather easy and was settled by Mirsky in 1947. He proved that

\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all squarefree}\}\rvert=c(\mathcal{H})x+O(x^{2/3+\epsilon})}



and \nu_{\mathcal{H}}(p^2) is the cardinality of the size of the projection of the set \mathcal{H} in \mathbb{Z}/p^2\mathbb{Z}, that is the number of forbidden classes modulo p^2 for n.

I show a quick way of deriving it with a much worse error term O(x/\log x). Fix w, a large number; ultimately, w=C\log x should be ok. Fix W=\prod_{p\leq w}p^2. Then the number X_w of n\leq x such that none of n+h_1,\ldots,n+h_k have a square factor p^2 with p\leq w is easy to compute: in [W] or any interval of length W there are by chinese remainder theorem this many of them
\displaystyle{ \prod_{p\leq w}(p^2-\nu_{\mathcal{H}}). }
Thus in [x], there are this many of them

\displaystyle{x/W^2\prod_{p\leq w}(p^2-\nu_{\mathcal{H}})+O(W^2)}.

There are also people who are not squarefree, but escape our sieving by primes less than w. These are the guys such that at least one of the n+h_i have a divisor p^2 with p>w prime. Now there are are at most x/p^2 multiples of p^2 in [x]. So at most
\displaystyle{ \sum_{w<p\leq \sqrt{x+\max h_i}}k(x+\max h_i)p^{-2}=o(x/w) }
should be removed. Let’s try to balance the error terms O(x/w) and O(W^2)=\exp(2w(1+o(1)); take w=\frac{1}{2}(\log x-\log\log x).
This way both seem to be O(x/\log x).

Green-Tao type theorem

We can also easily get asymptotics of the form

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)(\prod_p\beta_p+o(1))}

where K\subset [0,N]^d is a convex body and \Psi a system of affine-linear forms of which no two are affinely related. Moreover \beta_p is again the proportion of non-forbidden vectors a=(a_1,\ldots,a_d) of residues modulo p^2. One way to do it would be to simply use the Hardy-Littlewood type asymptotics proved above coordinate by coordinate. We can also prove it quite easily, by first observing that the set of n where at least one of the \psi_i vanishes has size O(N^{d-1}) because the 0-set of an affine-linear form is an affine hyperplane. Then as \psi_i(n)=O(N), the divisors a_i will have to satisfy a_i\ll \sqrt{N}. So now we restrict the set of n to the ones where the forms don’t vanish. Then we partition the sum into one where a_1 is restricted to be at most N^{\delta} (we fix some small \delta >0), and the remaining one. Using the well-known fact that \mu^2(n)=\sum_{d^2\mid n}\mu(d), we remark that the remaining one equals

\displaystyle{  \sum_{N^{\delta}\leq a_1\ll \sqrt{N}}\mu(a_1)\sum_{n:a_1^2\mid \psi_1(n)}\mu^2(\psi_2(n))\cdots\mu^2(\psi_t(n))}

Now the number of n in the sum is O(N^d/a_1^2+N^{d-1}). Given that \sum_{N^{\delta}\leq a_1} a_1^{-2}\ll N^{-\delta}, we can bound this sum by O(N^{d-\delta}). We proceed similarly for i=2,\ldots,t, so that the sum to estimate is

\displaystyle{  \sum_{a_1,\ldots,a_t\leq x^\delta}\prod_i\mu(a_i)\lvert\{n\in\mathbb{Z}^d\cap K\mid \forall i \quad \psi_i(n)\equiv 0\mod a_i^2\}\rvert}

Quite gross volume packing arguments indicate that the number of integral points (lattice points) to estimate above is \text{Vol}(K)\alpha_{\Psi}(a_1^2,\ldots,a_t^2)+O(N^{d-1+2t\delta}), where

\displaystyle{ \alpha_{\Psi}(k_1,\ldots,k_t)=\mathbb{E}_{n\in [\text{lcm}(k_i)]^d}\prod_i1_{k_i\mid\psi_i(n)}}


is the density of zeroes modulo the a_i^2. Moreover we can extend the sum beyond N^\delta at the cost of a mere N^{O(\delta)}. Hence finally for any \epsilon >0 we can write that

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)\prod_p\beta_p+o_\epsilon(N^{d-1+\epsilon}).}

Indeed by multiplicativity it is easy to transform \sum_{a_1,\ldots,a_t}\alpha_{\Psi}(a_1^2,\ldots,a_t^2)\prod_i\mu(a_i) into a product over primes.





The quest for repeated, infinitely frequent patterns in the primes, is certainly a very old one, and is often even a quest for the asymptotic frequency of these patterns, which is much harder. For instance proving that there are infinitely many primes is easy, finding a satisfactory answer for the question « How many (roughly) until N large ? » much less so. Inbetween lies the obtention of upper and lower bounds of the right shape, due by Chebyshev.

Here pattern stands for images of a system of polynomial forms. Given \Psi=(\psi_1,\cdots,\psi_t)\in(\mathbb{Z}[X_1,\cdots,X_d])^t and a convex set K=K_N\subset\mathbb{R}^d of volume increasing to infinity, we are thus interested in evaluating

\displaystyle{\sum_{n\in K\cap\mathbb{Z}^d}\prod_{i=1}^t\Lambda(\psi_i(n))}

for N large, where \Lambda is the von Mangoldt function. The case where d=1 and \psi_i(n)=n+b_i is the original question of Hardy and Littlewood, who proposed a tantalizing asymptotic behaviour but is still completely out of reach (even the question whether there are infinitely many n such that n,n+2 are both primes is not settled). But the case where the system \Psi is affine-linear (thus the polynomials are all of degree 1) and no two forms are affinely dependent was solved by Green and Tao in the celebrated article Linear equations in primes.

Similar results for more general polynomial forms are rare. We have to mention the famous work of Friedlander and Iwaniec yielding an asymptotic for the number of primes of the form p=x^2+y^4, where it appears that

\displaystyle{\sum_{x\leq\sqrt{N},y\leq N^{1/4}}\Lambda(x^2+y^4)}\sim Cx^{3/4}

for some constant C>0.

I have uploaded yesterday an article on the ArXiv which provides asymptotics of the same shape as the ones in the Hardy-Littlewood for a few exceptional polynomial patterns.  Thus for instance, I can tell how many arithmetic progressions of three primes smaller than N exist whose common difference is a sum of two squares – well not quite, because I have to weigh these arithmetic progressions by the number of representations of the common difference. Now this weight, giving a positive density to the set of sums of two squares, which is sparse, of density N/\sqrt{\log N}, just as the von Mangoldt function is a weight (mostly) on primes giving them a density, cannot be easily eliminated afterwards, in contrast to the von Mangoldt function (one can write for n\leq N that \Lambda(n)\sim 1_\mathbb{P}(n)\log n\sim 1_\mathbb{P}(n)\log N).

More precisely, the result that naturally comes out concerning three term arithmetic progressions with common difference a sum of two squares is

\displaystyle{\sum_{1\leq a\leq a+2d\leq N}\Lambda(a)\Lambda(a+d)\Lambda(a+2d)R(d)=\pi N^2/4\prod_p\beta_p+o(N^2)}

where R(n)=\mid\{(x,y)\in\mathbb{Z}^2\mid n=x^2+y^2\}\mid is the representation function and \beta_p are some explicit constant which I don’t reproduce here. Moreover, we can generalise to other positive definite binary quadratic forms than this one, and there’s nothing special about length three: an asymptotic is available for any length. Here we notice that in some sense, the result is only seemingly polynomial, and truly rather linear: the polynomial nature of the pattern is enclosed in a linear input into the representation function of a quadratic form.

In fact, my article contains a more general result of which the one above is only a particular case. My work consisted in mingling the von Mangoldt function with the representation functions of quadratic forms, whose behaviour on linear systems have been already analysed respectively in by Green and Tao and Matthiesen. The idea is to consider sums of the form

\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i=1}^tF_i(\psi_i(n))}

where F_i can be the von Mangoldt function or a representation function, and the $\psi_i$ are linear forms. The cohabitation of both types of functions went quite well. One delicate point was to eliminate biases modulo small primes of both types functions, an operation known as the W-trick. The difficulty is that while the value of the von Mangoldt function is more or less determined by the coprimality to small primes, it is not so for the representation function, which is also sensitive to the residue modulo large powers of small primes. Once this issue is adressed carefully, it is possible to majorize them by one and the same pseudorandom majorant, which opens the way to the application of the transference principle.

Similarly, the cohabitation between the von Mangoldt function and the divisor function is quite natural, yielding asymptotics for expressions such as \sum\Lambda(n)\Lambda(n+ab)\Lambda(n+2ab)=\sum\Lambda(n)\Lambda(n+d)\Lambda(n+2d)\tau(d). This is reminiscent of the Titchmarsh divisor problem, the evaluation of \sum_n\Lambda(n)\tau(n+a) or (almost equivalently) of \sum_p\tau(p+a), but the latter expression involves a linear system of infinite complexity, and is thus altogether out of reach of my method, just as the twin primes or the basic Hardy-Littlewood conjecture.

One may hope to extend the generalised Hardy-Littlewood conjecture stated (and proven in the finite complexity case in the paper linked) by Green and Tao to polynomial systems. For example given a polynomial \displaystyle{\phi \in \mathbb{Z}[x_1,\ldots,x_d]} and a bounded domain K\subset \mathbb{R}^d (probably a convex body would be more reasonable), one may be interested in an asymptotic for

\displaystyle{\lvert\{x=(x_1,\ldots,x_d)\in K\cap \mathbb{Z}^d\mid \phi(x)\text{ is prime}\}\rvert.}

We will rather try to get an asymptotic of the form

\displaystyle{\sum_{(x_1,\ldots,x_d)\in\mathbb{Z}^d\cap K}\Lambda(\phi(x_1,\ldots,x_d))=\beta_{\infty}\prod_p\beta_p+o(\beta_{\infty})}

where \beta_{\infty} is basically the number of points with positive integer coordinates in K, so hopefully \beta_{\infty}=\text{Vol}(K\cap \phi^{-1}(\mathbb{R}_+)), and the local factors take into account the obstructions or the absence of obstructions to primality modulo p. Recall that \Lambda classically denotes the von Mangoldt function. There are only very few non-linear polynomials for which an asymptotic for the number of prime values is available. The easiest one is obviously \phi(x,y)=x^2+y^2. Indeed, the primes which are sum of two squares are the primes congruent to 1 modulo 4 (and also the prime 2, but it’s a single prime, so we don’t have to care), and they are represented 8 times each as a value of \phi. So

\displaystyle{\sum_{x^2+y^2\leq N}\Lambda(x^2+y^2)\sim\sum_{n\leq N,n\equiv 1\text{ mod } 4}\Lambda(n)\sim 4N.}

Now let’s check what the conjecture would say. Here \beta_{\infty}=\text{Vol}(\{(x,y)\mid x^2+y^2\leq N\})\sim \pi N. What about the \beta_p ? They are supposed to be \beta_p=\frac{p}{p-1}\mathbb{P}(\phi(x,y)\neq 0\text{ mod } p). Now it easy to check that \mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{2p-1}{p^2} if p\equiv 1\text{ mod } 4, and \mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{1}{p^2} otherwise. So \beta_p=\frac{p-1}{p} in the former case, and \beta_p=\frac{p+1}{p} in the latter case. Thus, \prod_p\beta_p doesn’t converge absolutely, in contrast with the traditional Green-Tao situation, where \beta_p=1+O(1/p^2)… However, if we imagine that to each prime p\equiv 1\text{ mod } 4 corresponds a prime p'\equiv 3 \text{ mod } 4 with p\approx p', we could compute the product of the \beta_p by grouping the factors into pairs \beta_p\times\beta_{p'}\approx 1-1/p^2. A bit more precisely,

\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1+1/p)\sim C\sqrt{N}}


\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1-1/p)\sim C'/\sqrt{N}}

which implies that

\displaystyle{\prod_{p\leq N}\beta_p}\sim CC'

so this product is convergent, although not absolutely. I guess the constant C' is e^{-\gamma/2}, see Mertens theorem, and that C=e^{\gamma/2}\prod_{p\equiv 1\text{ mod } 4}(1-p^{-2}). I don’t really know how to compute this convergent product. We quickly notice that \beta_2=1. If the product could be equal to 4/\pi, which it can’t unfortunately, being smaller than 1, we would apparently get the correct result, but this remains a quite dodgy case, which should make one cautious about stating ambitious generalisations of the Hardy-Littlewood conjecture.