A recent breakthrough due to Croot, Lev and Pach, recycled by Ellenberg-Gijswijt and streamlined by Tao on this blog post, revolutionised the landscape of Roth’s type problems in vector spaces over finite fields. Indeed, Ellenberg-Gijswijt obtained that a set A\subset \mathbb{F}_3^n free of triple satisfying x_1+x_2+x_3=0 can have at most 2.756^n elements out of the 3^n out there. Here we show the same method can deal with sets free of solutions to various other equations in any \mathbb{F}_p^n (given that if q=p^k, we have \mathbb{F}_q^n\sim \mathbb{F}_p^{kn} as a vector space, there’s no point in considering general \mathbb{F}_q^n) and we make explicit the dependence on p of the resulting bound. We refer in the sequel mostly to Tao’s symmetric proof instead of the original articles. One great merit of this exposition is to reveal, thanks to the extension of the notion of rank from matrices (seen as functions of two variables) to functions of more variables, how the scope of the method can be extended.

Other linear equations

In fact the bound found for the sets without 3-term arithmetic progressions is also valid
for a set A\subset \mathbb{F}_p^n having no non-trivial solutions to
where \phi_1,\phi_2,\phi_3 are three linear automorphisms of \mathbb{F}_p^n summing to 0. Indeed,
we can then still write
\displaystyle{\delta_{0^n}(\phi_1(x_1)+\phi_2(x_2)+\phi_3(x_3))=\sum_{a\in A}\delta_a(x_1)\delta_a(x_2)\delta_a(x_3).}
Even more generally, for any integer k\geq 3, if \phi_1,\ldots,\phi_k are k automorphisms of \mathbb{F}_p^n which sum to 0, and if A\subset \mathbb{F}_p^n does not have any non-trivial solution to the equation \sum_{i\in[k]}\phi_i(x_i)=0, we can still write

\displaystyle{ \delta_{0^n}(\sum_{i\in[k]}\phi_i(x_i))=\sum_{a\in A}\prod_{i\in[k]}\delta_a(x_i). }

The right-hand side is still a function on A^k of rank \lvert A\rvert. Meanwhile, the left-hand side may still be expanded using the formula \delta_{0}(x)=1-x^{p-1} valid for all x\in\mathbb{F}_p, which implies that the LHS equals

\displaystyle{ \prod_{j\in[n]}(1-(\sum_{i\in[k]}\phi_{i}^j(x_i))^{p-1}) }

where \phi_i^j means the j-th coordinate of \phi_i; it is
thus a linear polynomial. This implies that the whole expression is a polynomial of degree (p-1)n. The treatment shown by Tao on his blog to bound the rank of this expression can thus be applied here too.

Roth’s theorem is simply the case where k=3 and \phi_1=\phi_2=\phi_3=\text{Id} for p=3, or more generally \phi_1=\phi_2=\text{Id} and \phi_3=-2\text{Id}.

The more variables, the smaller the bound becomes. For instance, a set A\subset \mathbb{F}_5^n without any solution to the equation x+3y=2z+2w, in particular without any 3-APs, has size at most 3.83^n, less than the bound for sets without 3-APS which is 4.46^n.

Even non-linear equations

Upon scrutinizing the polynomial method at work in Ellenberg-Gijswijt, one can be struck by the fact that the translation-invariance of the equation x+y+z=0 is not even used. In Meshulam’s theorem, as in all form of density increment strategy, we crucially use the fact that if (x,y,z) is a solution and if c\in\mathbb{F}_p^n is any constant, then (x+c,y+c,z+c) is a solution too. What is used in the polynomial strategy is that any constant tuple (c,\ldots,c) is a solution, which is vastly weaker (though it is synonymous for linear patterns). Thus if \phi_1,\phi_2,\ldots,\phi_k are any bounded-degree polynomial maps \mathbb{F}_p^n \longrightarrow\mathbb{F}_p^m (i.e. maps whose coordinates are polynomials of degree at most d) summing to 0, then the method should, in principle, spit out a bound for the maximal size of a set A\subset \mathbb{F}_p^n free of solutions to \sum\phi_i(x_i)=0, depending on the degree. The only thing that changes in Tao’s write-up is that the polynomial


 whose rank we try to bound is now of degree up to (p-1)md. Thus the parameter of relevance and which determines whether the method will succeed or not is m/n \cdot d/k. It has to be significantly less than 1/2. Otherwise, the method requiring a good bound (exponential if possible) on

\mathbb{P}(X_1+\cdots+X_n\leq (p-1)m d/k)

where X_1,\ldots, X_n are independent, identically distributed variable on \{0,\ldots,p-1\}, we are hopeless.

Thus, considering m=2n,d=2, we need k=9 variables.

Equations in a polynomial ring

As advocated by Tom Bloom (for instance in this paper), it might be of interest to ask for analogues of theorems in polynomial ring. However, a relation of the form f_1+f_2=2f_3 among three polynomials over \mathbb{F}_q of degree less than n is simply a relation among three vectors, the vectors of their coefficients. But we can form much more general equations, such as

\displaystyle{\sum_{i=1}^k\alpha_i f_i=0}

for some \alpha_1,\ldots,\alpha_k \in \mathbb{F}_p[t]. We require again \sum_i\alpha_i=0. Now the map f\mapsto \alpha f on the polynomials of degree less than n translates, in term of the coefficients of a polynomial, into a linear map \phi from \mathbb{F}_p^n to \mathbb{F}_p^{n+d} where d= \text{deg }\alpha.
Let d be the maximum of the degrees of the \alpha_i and let us introduce the maps \phi_i :\mathbb{F}_p^n\mapsto \mathbb{F}_p^{n+d} corresponding to asking for the size of A\subset \mathbb{F}_p[t] without any solution to the equation above amounts to asking for the size of a set A\subset \mathbb{F}_p^n without any solution to the equation \sum_i\phi_i(x_i)=0, a situation we have already analysed.
Even better, translating the map f\mapsto f^2 into a a polynomial map \mathbb{F}_p^n\mapsto \mathbb{F}_p^{2n-1}, namely
we’re back to the previous paragraph. Here m=2n-1,d=2, so that to ensure md/nk<1/2, we need k\geq 9. We then get a bound \lvert A\rvert \ll (0.99 p)^n. This number of variables is terrible. In fact, in the integers, Browning and Prendiville showed that 5 variables suffice for such an equation, that is, they got a bound for the size of a set A\subset [N] without solution to the equation \sum_{i=1}^4 x_i^2=4x_5^2. It is not satisfactory to end up needing more variables in the function field setting than in the integer case. The polynomial method does not seem good at tackling polynomial equations.

Dependence on p

Let’s return to the case of Roth’s theorem.
Following Tao, one gets a bound of the form \lvert A \rvert\ll n^{p-1}\exp(nc_p)
where c_p is the maximum of

\displaystyle{ h(\alpha_0,\ldots,\alpha_{p-1})=-\sum_{k=0}^{p-1}\alpha_k\log\alpha_k}

under the constraints

\displaystyle{ \alpha_i\geq 0, \sum \alpha_i=1,\sum i\alpha_i\leq (p-1)/3}

It could be interesting to know how c_p evolves with p.
Using the help of a computer to compute c_p for p upto 10^8, I got the feeling that \exp(c_p)/p\approx 0.84+0.30/p. This would mean that a subset A\subset \mathbb{F}_p^n free of 3-term arithmetic progressions necessarily satisfies \lvert A \rvert\ll (0.85p)^n.

We now proceed to show that the empirically conjectured behaviour of c_p is indeed the true one.

The constrained extrema theorem implies that the maximum of h is attained on a point where

\displaystyle{\overrightarrow{\mathrm{grad}} h=-(\alpha_0\log\alpha_0+1,\ldots,\alpha_{p-1}\log\alpha_{p-1}+1)}

belongs to \text{Vect}((1,\ldots,1),(0,1,\ldots,p-1)). This condition amounts to the existence of a,b\in\mathbb{R} so that \alpha_i\log\alpha_i=\exp(1-a-ib) for all indices i. So we have only two degrees of freedom for the tuple (\alpha_0,\ldots,\alpha_{p-1}. And because we have two equations

\displaystyle{ \sum \alpha_i=1,\sum i\alpha_i= (p-1)/3}

there should be a unique such tuple.
Putting q=\exp(-b), we transform the first constraint in

\displaystyle{ \exp(1-a)\sum_{i=0}^{p-1} q^i=1 }

thus the second one amounts to

\displaystyle{ \sum_{i=0}^{p-1}iq^i=\frac{p-1}{3}\sum q^i. }

Here the LHS equals

\displaystyle{ \frac{((p-1)q-p)q^p+q}{(q-1)^2}. }

Thus the equation to solve is

\displaystyle{ ((p-1)q-p)q^p+q=\frac{(p-1)(q-1)}{3}(q^p-1), }

in other words

\displaystyle{ 2(p-1)q^{p+1}-(2p+1)q^p+3q+(p-1)q=(p-1). }

It clearly imposes that as p\rightarrow\infty, we must have q\rightarrow 1, for if it remains bounded away from 1, the left-hand side is asymptotic to (p-1)q, which is not asymptotic to p-1, the right-hand side.

We now rearrange the equation as

\displaystyle{ p(q+2q^{p+1}-2q^p)-2q^{p+1}-q^p+2q=p-1 }

that is

\displaystyle{ p(q-1)(1+2q^p)=2q^{p+1}+q^p-2q-1=(q^p-1)(2q+1). }

Let us introduce \epsilon=1-q\rightarrow 0 and determine the behaviour of p\epsilon, or equivalently of q^p. One can prove by contradiction that p\epsilon cannot tend to 0. Indeed, supposing it did, one could write

\displaystyle{ q^p=\exp(p\log(1-\epsilon))=\exp(-p\epsilon)\exp(O(p\epsilon^2)) =1-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2. }


\displaystyle{ -p\epsilon(3-2p\epsilon+(p\epsilon)^2+o(p\epsilon)^2)=(-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2)(3-2\epsilon) }

and comparing the term in (p\epsilon)^2 we see an absurdity.
It can’t either tend to \infty. If it has a limit point \alpha, it has to be a solution to

\displaystyle{ -\alpha(1+2\exp(-\alpha))=3(\exp(-\alpha)-1). }


The computer declares that the solution to this equations equals 2.149125799907062 (which is very close, fortunately enough, to values observed for p(1-q) for large p). Thus q^p tends to \exp(-\alpha)\approx 0.11.

Injecting the expression for the \alpha_i in terms of a,b,  the optimum is

\sum_k(a+kb-1)\exp(1-a-kb) = \exp(1-a)\left((a-1)\sum q^k+b\sum kq^k\right)

which we can equate, remembering earlier formulae, to
So let us derive an asymptotic expression for a and b. We quickly obtain a=\log p+\log\frac{1-\exp(-\alpha)}{\alpha}+1+o(1), b=\alpha/p+o(1/p).
So the maximum we are after equals

\displaystyle{c_p= \log p+\log\frac{1-\exp(-\alpha)}{\alpha}+\alpha/3+o(1) }

This implies that

\displaystyle{\frac{\exp(c_p)}{p}= \frac{1-\exp(-\alpha)}{\alpha}\exp(\alpha/3). }

Injecting the value we found for \alpha, the computer spits out 0.84, as guessed.