Let $V=\mathbb{F}_p^n$ be a vector space over a finite prime field. One knows then which subsets $A\subset V$ satisfy $A+A=A$, the sets that are stable under addition: these are the kernels of linear maps, that is, each such set $A$ is the zero set of a bunch of linear forms.

Now we consider a two-coordinates analogous problem, that is, we consider a set $A\subset V\times V$ and define the sets

$\displaystyle{P\stackrel{V}{+}P= \{(x,y_1+ y_2)\mid (x,y_1),(x,y_2)\in P\}}$

and $P\stackrel{H}{+}P$, V and H meaning vertical and horizontal. What are the sets that satisfy both $P\stackrel{V}{+}P=P$ and $P\stackrel{H}{+}P=P$? Call such a set a transverse set. Natural exemples are cartesian products of vector spaces (which we shall call rectangles) as well as zero sets of bilinear forms. More generally, a set of the form $\displaystyle{\{(x,y)\in W_1\times W_2\mid Q_1(x,y)=\cdots=Q_r(x,y)=0\}}$ for some subspaces $W_2,W_2\leq V$ and some bilinear forms $Q_i$ is horizontally and vertically closed. Call such a set a bilinear set. Is it the only example?

For a transverse set $P\subset V\times V$ and $x\in V$, let $P_x$ be the fiber above x. Thus $P_x$ is a vector space. Actually $P_x$ depends only the projective class $[x]\in P(V)$. Moreover, the stability under horizontal operations is equivalent to $P_{x+y}\supset P_x\cap P_y$; more generally if $[z]$ is on the projective line spanned by $x$ and $y$, we have $P_z\supset P_x\cap P_y$.

The codimension 1 case

Suppose $P_0=V$ while all other subspaces $P_x$ are hyperplanes. Write $P_x=\xi(x)^\perp$ with $\xi(x)\in P(V)$ some vector uniquely defined up to homothety. In other words, we have a map $\xi : P(V)\rightarrow P(V)$ satisfying the property that if $z$ is on the projective line spanned by $x$ and $y$, then $\xi_{z}$ lies on the projective line spanned by $\xi_x$ and $\xi_y$. Thus $\xi$ maps projective lines onto lines.

If we suppose it is injective, too, then by the fundamental theorem of projective geometry, we conclude that it is a projective map. Thus it is induced by a linear map $\xi : V\rightarrow V$. Hence writing $Q(x,y)=\xi(x)\cdot y$, we have shown that $P$ is the zero set of the bilinear form $Q$.

In general $\xi$ is far from being injective. It can for instance be constant on $P(V)$ (equal to $H^\perp$ for some hyperplane $H$). In this case $P=\{0\}\times V\cup V\times H$. This is neither a rectangle, nor the zero form of a quadratic form, because if it was, $\xi$ could be taken linear, and because $\xi_x=0\leftrightarrow x=0$, it should be injective. However $P$ obviously contains a very large rectangle.

I can at least find a large bilinear set in the transverse set $P$. If $\xi$ is not injective on $P(V)$, there exist two linearly independent vectors $x,x'$ where $\xi$, takes the same value, which it then takes at least on $\text{span}(x,x')\setminus\{0\}$. Take a complement $V_1$ of this plane. If there $\xi$ is injective, we’re done. Otherwise, go on. After $k$ steps, we have $D_1,\cdots,D_k$ planes whose sum $D$ is of dimension 2k and for which there is a subspace $H$ of codimension at most $k$ such that $D\times H$ is in $P$. Either we never find injectivity before $k=n/3$ steps, in which case we basically end up with a rectangle $W_1\times W_2$ made up from two subspaces of dimension $2n/3$, or we find injectivity on a space $W$ of dimension at least $n/3$. Then we have a bilinear set $\{(x,y)\in W\times V\mid Q(x,y)=0\}$ for some bilinear form $Q$.

Arbitrary codimension

A more general potential counterexample, communicated to me by Ben Green, is a set of the form $\bigcup_i V_i\times W_i$ where $V_i, W_i$ are sequences of increasing (resp. decreasing) subspaces.  Suppose for instance that $V_0=W_r=\{0\}$ while $W_0=V_r=V$. Thus for $x\in V$, we have $P_x=W_{i(x)}$ where $i(x)=\min\{i\mid x\in V_i\}$. I can’t quite prove it’s not a bilinear set though.

Anyway it contains a large rectangle. Indeed, one of the $V_i\times W_i$ must have size at least $\lvert P\rvert /n$.

Given a compact abelian group $G$, endowed with a Haar probability measure $\mu$, let $\mathcal{F}_a$ be the set of (Borel) measurable functions $G\rightarrow [0,1]$ satisfying $\int_G fd\mu \geq \alpha$. The average of $f$ is called its density.

Let $\Psi$ be system of linear forms, i.e $\Psi=(\psi_1,\ldots,\psi_t): G^d\rightarrow G^t$ and

$\displaystyle{\psi_i(x_1,\ldots,x_d)=\sum_{j=1}^dc_{i,j}x_j}$

where $c_{i,j}\in\mathbb{Z}$ for any $i\in[t]$ and $j\in[d]$. For a measurable function $f : G\rightarrow [0,1]$, one may define the $\Psi$-count of $f$ as

$\displaystyle{T_\Psi(f)=\int_{G^d}\prod_i f(\psi_i(x))d\mu(x).}$

Here by a slight abuse of notation, the measure $\mu$ on $G^d$ denotes in fact the product measure $\mu^{\otimes d}$ of the Haar measure on $G$.

Let

$\displaystyle{m_\Psi(\alpha,G)=\inf_{f\in\mathcal{F}_a}T_\Psi(f).}$

When a system $\Psi$ is fixed, we may omit it from the notation $m_\Psi$. Given a sequence $G_n$ of finite groups of cardinality tending to infinity, it is natural to consider the sequence $m(\alpha,G_n)$. When the infimum is taken only over $\{0,1\}$-valued measurable functions of density at least $\alpha$, this is the minimum number of APs a set $A\subset G_n$ of density at least $\alpha$ can have. But these infimum are actually the same (basically by convexity).

The case $G_n=\mathbb{Z}/n\mathbb{Z}$ was introduced by Croot. He proved that the minimal density $m(\alpha,\mathbb{Z}_N)$ of three-term arithmetic progressions converged as $N\rightarrow\infty$ through the primes. Sisask, in his PhD thesis, was able to express this limit as an integral over the corresponding limit object, namely the circle $\mathbb{T}=\mathbb{R}/\mathbb{Z}$, thus

$\displaystyle{\lim_{\substack{N\rightarrow +\infty\\ N\text{ prime}}} m(\alpha,\mathbb{Z}_N)=m(\alpha,\mathbb{T}).}$

As customary, the corresponding problem in finite field models is cleaner. Thus we set $G_n=\mathbb{F}_p^n$. Whereas Croot noticed that the sequence $m(\alpha,\mathbb{Z}_N)$ was in some sense approximately decreasing, it is easy to see that for any system $\Psi$, the sequence $m_\Psi(\alpha,G_n)$ is genuinely non-increasing. This is because $\mathbb{F}_p^n$ embeds naturally in $\mathbb{F}_p^{n+1}$; thus if $f$ is a map from $\mathbb{F}_p^n$ to $[0,1]$, it can be extended into $\tilde{f} : \mathbb{F}_p^{n+1}\rightarrow [0,1]$ by putting $\tilde{f}(x_1,\ldots,x_{n+1})=f(x_1,\ldots,x_n)$.

Now $\mathbb{E} \tilde{f}=\mathbb{E}f$, and $T_\Psi(f)=T_\Psi(f)$. This proves that $m_\Psi(\alpha,G_n)\geq m_\Psi(\alpha,G_{n+1})$, which implies that this sequence has a limit.

Let $\Psi$ be now fixed. We want to find an appealing expression for the limit of the sequence $m(\alpha,G_n)$. It is thus tempting to look for a natural topological compact group that would play the role of limit object of the sequence $G_n$, just as $\mathbb{T}$ played the role of limit object for the sequence $\mathbb{Z}/N\mathbb{Z}$.

The analogy $\mathbb{Z}\leftrightarrow \mathbb{F}_p[t]=\cup \mathbb{F}_p^n=\mathbb{F}_p^{\infty}$ leads one to introducing $G_\mathbb{N}=\mathbb{F}_p^\mathbb{N}$ and $G_\infty=\mathbb{F}_p^{\infty}$. The group $G_\mathbb{N}$ equipped with the product topology is a compact group by Tychonoff’s theorem, and $G_\infty$ equipped with the discrete topology is locally compact; it is the Pontryagin dual of $G_\mathbb{N}$. It is now reasonable to ask whether

$\displaystyle{\lim m(\alpha,G_n)=m(\alpha, G_\mathbb{N}).}$

One easily notices that $m(\alpha,G_n)\geq m(\alpha, G_\mathbb{N})$ for any $n\in\mathbb{N}$ by extending, as above, a function $f : \mathbb{F}_p^n \rightarrow [0,1]$ into a function $\tilde{f}$ on $\mathbb{F}_p^\mathbb{N}$ with the same density and the same $\Psi$-count. Indeed, one can define

$\displaystyle{\tilde{f}((x_i)_{i\in\mathbb{N}})=f(x_1,\ldots,x_n)}$

and then $\tilde{f}$ is a measurable function which has same density and same $\Psi$-count.

We prove the other inequality. For this we prove two intermediate propositions. We write from henceforth $G=G_\mathbb{N}$.

Proposition 1

If $f_n$ is a sequence of $[0,1]$-valued functions on $G$ that converges in $L^2(G)$ to a $[0,1]$-valued function $f$, then $T(f_n)\rightarrow T(f)$.

Proof

Let $F_n(x)=\prod_{i\in[t]} f_n(\psi_i(x))-\prod_{i\in[t]} f(\psi_i(x))$. We show that $F_n$ converges in $L^1(G^d)$ to 0, which will conclude. In fact we prove a stronger convergence, a convergence in $L^2$. We show that $\lVert F_n-F\rVert_{2}\leq t \lVert f_n-f\rVert_2$. For that we simply remark that

$\displaystyle{F_n(x)=\sum_{j=1}^{t}\prod_{ij}f(\psi_k(x)).}$

Now, any non-trivial linear map $\theta : G^d\rightarrow G$, i.e. any map of the form

$\displaystyle{\theta(x_1,\ldots,x_d)=\sum_{i=1}^d c_ix_i.}$

with $(c_1,\ldots,c_d)\in \mathbb{F}_p^d\setminus\{0\}$, preserves the Haar measure, which implies that

$\displaystyle{\lVert f_n\circ\psi_j-f\circ\psi_j\rVert_{L^2(G^d)}=\lVert f_n-f\rVert_{L^2(G)}.}$

Using this, the triangle equality and the fact that the functions are 1-bounded, we conclude.

Proposition 2

Let $\mathcal{F}_{a,n}$ be the set of functions $\tilde{f}$ for $f : G_n\rightarrow [0,1]$ of density at least $\alpha.$ Then $\mathcal{F}_{a,n}$ is dense in $(\mathcal{F}_a, \lVert \cdot\rVert_2).$

Proof

Let $f\in\mathcal{F}_a$We construct a sequence of functions $f_n\in\mathcal{F}_{a,n}$ that converge to $f.$ We use duality and Fourier transform. We have $\hat{G_n}=G_n$ and $\hat{G_\mathbb{N}}=G_\infty=\cup G_n.$

For $\chi\in G_\infty$, write

Then write

This defines a function $f_n : G_n \rightarrow \mathbb{C}$, which obviously satisfies

.

The same holds for its extension $\tilde{f_n}$Notice that $\tilde{f_n}$ is also the conditional expectation with respect to the $\sigma$-algebra $\mathcal{P}(G_n)$More precisely,

$\displaystyle{f_n(x_1,\ldots,x_n)=\mathbb{E}_{y\in G}(f(y)\mid (y_1,\ldots,y_n)=(x_1,\ldots,x_n))}$

as both equal .

This implies in particular that $f_n$, and thus $\tilde{f_n}$has its values in $[0,1]$.

To check that $\mathbb{E} f_n \geq \alpha$, we use the conditional expectation interpretation

$\displaystyle{\mathbb{E}[f_n]=\mathbb{E}[\mathbb{E}[f\mid G_n]]=\mathbb{E}[f]\geq \alpha.}$

Now we check the convergence. It follows from basic harmonic analysis: $f$ being in $L^2(G)$, its Fourier series $f_n$ converges to $f$ in $L^2(G)$. This concludes the proof of Proposition 2.

Thus even for 4-APs, this shows that

$\displaystyle{\lim_{n\rightarrow\infty}\inf_{f:\mathbb{F}_p^n\rightarrow[0,1],\mathbb{E}f\geq\alpha}\mathbb{E}_{x,y}f(x)f(x+y)f(x+2y)f(x+3y)}$

exists and equals

$\displaystyle{\inf_{f:\mathbb{F}_p^\mathbb{N}\rightarrow[0,1],\int f\geq \alpha}\int f(x)f(x+y)f(x+2y)f(x+3y)dxdy.}$

This is in contrast to a paper of Candela-Szegedy, for the same pattern $\Psi$ (4-APs) but with $G_n=\mathbb{Z}/n\mathbb{Z}$ where they obtain as a limit a much more complicated infimum, an infimum over functions $f$ on $\mathbb{T}^2$ of the $\Psi'$-count, where $\Psi'\neq\Psi$ is another pattern. The part that holds in the setting of $\mathbb{F}_p^n$ and not in the setting of $\mathbb{Z}_p$ is Proposition 2. The truncated Fourier series of $f : \mathbb{T}\rightarrow[0,1]$, which is the natural $L^2$ approximation, does not give a function on cyclic groups $\mathbb{Z}_p$ with the same density and AP-count as $f$.

We remark that the paper of Hatami-Hatami-Hirst implies that the sequence of functions $f_n$ admits a limit object $g$ which is defined on a finite cartesian power of $G_\mathbb{N}$.

In fact, if $\Psi$ is the $k$-APs, pattern of complexity $k-2$, and $p\geq k$, the limit object $g$ is defined on $G_\mathbb{N}^{k-2}$. In particular, if $k=3$, this means that the infimum above is attained, i.e.

$\lim_{n\rightarrow \infty}m(\alpha, G_n)=T(g)$

for some $g : \mathbb{F}_p^\mathbb{N}\rightarrow [0,1]$ measurable. When , it is still not known whether such a result (i.e. the existence of a limit object $g : \mathbb{T}\rightarrow [0,1]$) holds. However, Szegedy proves the existence of a limit object $g : X\rightarrow [0,1]$ on a non-explicit group $X$.

A recent breakthrough due to Croot, Lev and Pach, recycled by Ellenberg-Gijswijt and streamlined by Tao on this blog post, revolutionised the landscape of Roth’s type problems in vector spaces over finite fields. Indeed, Ellenberg-Gijswijt obtained that a set $A\subset \mathbb{F}_3^n$ free of triple satisfying $x_1+x_2+x_3=0$ can have at most $2.756^n$ elements out of the $3^n$ out there. Here we show the same method can deal with sets free of solutions to various other equations in any $\mathbb{F}_p^n$ (given that if $q=p^k$, we have $\mathbb{F}_q^n\sim \mathbb{F}_p^{kn}$ as a vector space, there’s no point in considering general $\mathbb{F}_q^n$) and we make explicit the dependence on $p$ of the resulting bound. We refer in the sequel mostly to Tao’s symmetric proof instead of the original articles. One great merit of this exposition is to reveal, thanks to the extension of the notion of rank from matrices (seen as functions of two variables) to functions of more variables, how the scope of the method can be extended.

Other linear equations

In fact the bound found for the sets without 3-term arithmetic progressions is also valid
for a set $A\subset \mathbb{F}_p^n$ having no non-trivial solutions to
$\displaystyle{\phi_1(x_1)+\phi_2(x_2)+\phi_3(x_3)=0}$
where $\phi_1,\phi_2,\phi_3$ are three linear automorphisms of $\mathbb{F}_p^n$ summing to 0. Indeed,
we can then still write
$\displaystyle{\delta_{0^n}(\phi_1(x_1)+\phi_2(x_2)+\phi_3(x_3))=\sum_{a\in A}\delta_a(x_1)\delta_a(x_2)\delta_a(x_3).}$
Even more generally, for any integer $k\geq 3$, if $\phi_1,\ldots,\phi_k$ are $k$ automorphisms of $\mathbb{F}_p^n$ which sum to 0, and if $A\subset \mathbb{F}_p^n$ does not have any non-trivial solution to the equation $\sum_{i\in[k]}\phi_i(x_i)=0$, we can still write

$\displaystyle{ \delta_{0^n}(\sum_{i\in[k]}\phi_i(x_i))=\sum_{a\in A}\prod_{i\in[k]}\delta_a(x_i). }$

The right-hand side is still a function on $A^k$ of rank $\lvert A\rvert$. Meanwhile, the left-hand side may still be expanded using the formula $\delta_{0}(x)=1-x^{p-1}$ valid for all $x\in\mathbb{F}_p$, which implies that the LHS equals

$\displaystyle{ \prod_{j\in[n]}(1-(\sum_{i\in[k]}\phi_{i}^j(x_i))^{p-1}) }$

where $\phi_i^j$ means the $j$-th coordinate of $\phi_i$; it is
thus a linear polynomial. This implies that the whole expression is a polynomial of degree $(p-1)n$. The treatment shown by Tao on his blog to bound the rank of this expression can thus be applied here too.

Roth’s theorem is simply the case where $k=3$ and $\phi_1=\phi_2=\phi_3=\text{Id}$ for $p=3$, or more generally $\phi_1=\phi_2=\text{Id}$ and $\phi_3=-2\text{Id}$.

The more variables, the smaller the bound becomes. For instance, a set $A\subset \mathbb{F}_5^n$ without any solution to the equation $x+3y=2z+2w$, in particular without any 3-APs, has size at most $3.83^n$, less than the bound for sets without 3-APS which is $4.46^n$.

Even non-linear equations

Upon scrutinizing the polynomial method at work in Ellenberg-Gijswijt, one can be struck by the fact that the translation-invariance of the equation $x+y+z=0$ is not even used. In Meshulam’s theorem, as in all form of density increment strategy, we crucially use the fact that if $(x,y,z)$ is a solution and if $c\in\mathbb{F}_p^n$ is any constant, then $(x+c,y+c,z+c)$ is a solution too. What is used in the polynomial strategy is that any constant tuple $(c,\ldots,c)$ is a solution, which is vastly weaker (though it is synonymous for linear patterns). Thus if $\phi_1,\phi_2,\ldots,\phi_k$ are any bounded-degree polynomial maps $\mathbb{F}_p^n \longrightarrow\mathbb{F}_p^m$ (i.e. maps whose coordinates are polynomials of degree at most $d$) summing to 0, then the method should, in principle, spit out a bound for the maximal size of a set $A\subset \mathbb{F}_p^n$ free of solutions to $\sum\phi_i(x_i)=0$, depending on the degree. The only thing that changes in Tao’s write-up is that the polynomial

$\displaystyle{\delta_{0^m}(\sum_{i=1}^k\phi(x_i))\in\mathbb{F}_p[(x_{i,j})_{i\in[k],j\in[n]}}]$

whose rank we try to bound is now of degree up to $(p-1)md$. Thus the parameter of relevance and which determines whether the method will succeed or not is $m/n \cdot d/k$. It has to be significantly less than $1/2$. Otherwise, the method requiring a good bound (exponential if possible) on

$\mathbb{P}(X_1+\cdots+X_n\leq (p-1)m d/k)$

where $X_1,\ldots, X_n$ are independent, identically distributed variable on $\{0,\ldots,p-1\}$, we are hopeless.

Thus, considering $m=2n,d=2$, we need $k=9$ variables.

Equations in a polynomial ring

As advocated by Tom Bloom (for instance in this paper), it might be of interest to ask for analogues of theorems in polynomial ring. However, a relation of the form $f_1+f_2=2f_3$ among three polynomials over $\mathbb{F}_q$ of degree less than $n$ is simply a relation among three vectors, the vectors of their coefficients. But we can form much more general equations, such as

$\displaystyle{\sum_{i=1}^k\alpha_i f_i=0}$

for some $\alpha_1,\ldots,\alpha_k \in \mathbb{F}_p[t]$. We require again $\sum_i\alpha_i=0$. Now the map $f\mapsto \alpha f$ on the polynomials of degree less than $n$ translates, in term of the coefficients of a polynomial, into a linear map $\phi$ from $\mathbb{F}_p^n$ to $\mathbb{F}_p^{n+d}$ where $d= \text{deg }\alpha$.
Let $d$ be the maximum of the degrees of the $\alpha_i$ and let us introduce the maps $\phi_i :\mathbb{F}_p^n\mapsto \mathbb{F}_p^{n+d}$ corresponding to asking for the size of $A\subset \mathbb{F}_p[t]$ without any solution to the equation above amounts to asking for the size of a set $A\subset \mathbb{F}_p^n$ without any solution to the equation $\sum_i\phi_i(x_i)=0$, a situation we have already analysed.
Even better, translating the map $f\mapsto f^2$ into a a polynomial map $\mathbb{F}_p^n\mapsto \mathbb{F}_p^{2n-1}$, namely
$\displaystyle{\phi(f_0,\ldots,f_{n-1})=(f_0^2,2f_0f_1,f_1^2+f_0f_1,\ldots,f_{n-1}^2)}$
we’re back to the previous paragraph. Here $m=2n-1,d=2$, so that to ensure $md/nk<1/2$, we need $k\geq 9$. We then get a bound $\lvert A\rvert \ll (0.99 p)^n$. This number of variables is terrible. In fact, in the integers, Browning and Prendiville showed that 5 variables suffice for such an equation, that is, they got a bound for the size of a set $A\subset [N]$ without solution to the equation $\sum_{i=1}^4 x_i^2=4x_5^2$. It is not satisfactory to end up needing more variables in the function field setting than in the integer case. The polynomial method does not seem good at tackling polynomial equations.

Dependence on $p$

Following Tao, one gets a bound of the form $\lvert A \rvert\ll n^{p-1}\exp(nc_p)$
where $c_p$ is the maximum of

$\displaystyle{ h(\alpha_0,\ldots,\alpha_{p-1})=-\sum_{k=0}^{p-1}\alpha_k\log\alpha_k}$

under the constraints

$\displaystyle{ \alpha_i\geq 0, \sum \alpha_i=1,\sum i\alpha_i\leq (p-1)/3}$

It could be interesting to know how $c_p$ evolves with $p$.
Using the help of a computer to compute $c_p$ for $p$ upto $10^8$, I got the feeling that $\exp(c_p)/p\approx 0.84+0.30/p$. This would mean that a subset $A\subset \mathbb{F}_p^n$ free of 3-term arithmetic progressions necessarily satisfies $\lvert A \rvert\ll (0.85p)^n$.

We now proceed to show that the empirically conjectured behaviour of $c_p$ is indeed the true one.

The constrained extrema theorem implies that the maximum of $h$ is attained on a point where

$\displaystyle{\overrightarrow{\mathrm{grad}} h=-(\alpha_0\log\alpha_0+1,\ldots,\alpha_{p-1}\log\alpha_{p-1}+1)}$

belongs to $\text{Vect}((1,\ldots,1),(0,1,\ldots,p-1))$. This condition amounts to the existence of $a,b\in\mathbb{R}$ so that $\alpha_i\log\alpha_i=\exp(1-a-ib)$ for all indices $i$. So we have only two degrees of freedom for the tuple $(\alpha_0,\ldots,\alpha_{p-1}$. And because we have two equations

$\displaystyle{ \sum \alpha_i=1,\sum i\alpha_i= (p-1)/3}$

there should be a unique such tuple.
Putting $q=\exp(-b)$, we transform the first constraint in

$\displaystyle{ \exp(1-a)\sum_{i=0}^{p-1} q^i=1 }$

thus the second one amounts to

$\displaystyle{ \sum_{i=0}^{p-1}iq^i=\frac{p-1}{3}\sum q^i. }$

Here the LHS equals

$\displaystyle{ \frac{((p-1)q-p)q^p+q}{(q-1)^2}. }$

Thus the equation to solve is

$\displaystyle{ ((p-1)q-p)q^p+q=\frac{(p-1)(q-1)}{3}(q^p-1), }$

in other words

$\displaystyle{ 2(p-1)q^{p+1}-(2p+1)q^p+3q+(p-1)q=(p-1). }$

It clearly imposes that as $p\rightarrow\infty$, we must have $q\rightarrow 1$, for if it remains bounded away from 1, the left-hand side is asymptotic to $(p-1)q$, which is not asymptotic to $p-1$, the right-hand side.

We now rearrange the equation as

$\displaystyle{ p(q+2q^{p+1}-2q^p)-2q^{p+1}-q^p+2q=p-1 }$

that is

$\displaystyle{ p(q-1)(1+2q^p)=2q^{p+1}+q^p-2q-1=(q^p-1)(2q+1). }$

Let us introduce $\epsilon=1-q\rightarrow 0$ and determine the behaviour of $p\epsilon$, or equivalently of $q^p$. One can prove by contradiction that $p\epsilon$ cannot tend to 0. Indeed, supposing it did, one could write

$\displaystyle{ q^p=\exp(p\log(1-\epsilon))=\exp(-p\epsilon)\exp(O(p\epsilon^2)) =1-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2. }$

Thus

$\displaystyle{ -p\epsilon(3-2p\epsilon+(p\epsilon)^2+o(p\epsilon)^2)=(-p\epsilon+(p\epsilon)^2/2+o(p\epsilon)^2)(3-2\epsilon) }$

and comparing the term in $(p\epsilon)^2$ we see an absurdity.
It can’t either tend to $\infty$. If it has a limit point $\alpha$, it has to be a solution to

$\displaystyle{ -\alpha(1+2\exp(-\alpha))=3(\exp(-\alpha)-1). }$

The computer declares that the solution to this equations equals 2.149125799907062 (which is very close, fortunately enough, to values observed for $p(1-q)$ for large $p$). Thus $q^p$ tends to $\exp(-\alpha)\approx 0.11$.

Injecting the expression for the $\alpha_i$ in terms of $a,b$,  the optimum is

$\sum_k(a+kb-1)\exp(1-a-kb) = \exp(1-a)\left((a-1)\sum q^k+b\sum kq^k\right)$

which we can equate, remembering earlier formulae, to
$(a-1)+b\frac{p-1}{3}$.
So let us derive an asymptotic expression for $a$ and $b$. We quickly obtain $a=\log p+\log\frac{1-\exp(-\alpha)}{\alpha}+1+o(1), b=\alpha/p+o(1/p)$.
So the maximum we are after equals

$\displaystyle{c_p= \log p+\log\frac{1-\exp(-\alpha)}{\alpha}+\alpha/3+o(1) }$

This implies that

$\displaystyle{\frac{\exp(c_p)}{p}= \frac{1-\exp(-\alpha)}{\alpha}\exp(\alpha/3). }$

Injecting the value we found for $\alpha$, the computer spits out 0.84, as guessed.

Many properties of the set of the primes are inherited by any dense subset thereof, such as the existence of solutions to various linear equations. But there’s one property, recently proved, namely the existence of infinitely many gaps $p_{n+1}-p_n\leq C$ bounded by a constant, which is not inherited by dense subsets of the primes. For instance, it is possible to remove basically one half of the primes and be left with a subset of primes $\{p'_n\mid n\in\mathbb{N}\}$ such that $p'_{n+1}-p'_n\ll \log p_n$.

So instead one could ask if when coloring the primes in finitely many colours, one is guaranteed to find a colour class containing infinitely many pairs of primes whose difference is bounded by a common constant. This does not follow immediately from pigeonhole principle, which only says that one colour class contains infinitely many primes.

However, this follows easily if we recall that the work of Maynard does not only give infinitely many bounded gaps, that is $\liminf p_{n+1}-p_n<+\infty$, but also for any $m$ that $\liminf p_{n+m}-p_n < +\infty$. Let $H(m)$ be this liminf.

In other words there are infinitely many (disjoint) intervals of size $I_n=[x_n,x_n+H(m)]$ which contains at least $m+1$ primes. So if you colour the primes in $m$ colours, and you consider these intervals, you find that each one must contains at least two primes of the same colour. By pigeonhole principle again, there must be a colour for which this occurs infinitely often. Whence the existence in this colour class of infinitely gaps of size at most $H(m)$.

The squarefree numbers form a set which is well-known to have density $6/\pi^2$ in the integers, in other words their characteristic function $\mu^2$ satisfy $\sum_{n\leq x}\mu^2(n)=6/\pi^2x+o(x)$. In this set all your dream theorems about the count of linear patterns are true and easy.

Hardy-Littlewood squarefree tuples

The Hardy-Littlewood conjecture consists, in its strongest form, in an asymptotic, for every $\mathcal{H}=\{h_1,\ldots,h_k\}$, of the form

$\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all prime}\}\rvert=(c(\mathcal{H})+o(1))\frac{x}{\log^kx}}$

where $c(\mathcal{H})$ is a constant possibly equal to 0. It is quite far from being proven, although interesting steps towards it have been made in the last ten years, by Goldston-Pintz-Yildirim and Maynard.

In the squarefree numbers, the analogous problem is rather easy and was settled by Mirsky in 1947. He proved that

$\displaystyle{\lvert\{n\leq x\mid n+h_1,\ldots,n+h_k\text{ are all squarefree}\}\rvert=c(\mathcal{H})x+O(x^{2/3+\epsilon})}$

where

$\displaystyle{c(\mathcal{H})=\prod_p(1-\frac{\nu_{\mathcal{H}}(p^2)}{p^2})}$

and $\nu_{\mathcal{H}}(p^2)$ is the cardinality of the size of the projection of the set $\mathcal{H}$ in $\mathbb{Z}/p^2\mathbb{Z}$, that is the number of forbidden classes modulo $p^2$ for $n$.

I show a quick way of deriving it with a much worse error term $O(x/\log x)$. Fix $w$, a large number; ultimately, $w=C\log x$ should be ok. Fix $W=\prod_{p\leq w}p^2$. Then the number $X_w$ of $n\leq x$ such that none of $n+h_1,\ldots,n+h_k$ have a square factor $p^2$ with $p\leq w$ is easy to compute: in $[W]$ or any interval of length $W$ there are by chinese remainder theorem this many of them
$\displaystyle{ \prod_{p\leq w}(p^2-\nu_{\mathcal{H}}). }$
Thus in $[x]$, there are this many of them

$\displaystyle{x/W^2\prod_{p\leq w}(p^2-\nu_{\mathcal{H}})+O(W^2)}.$

There are also people who are not squarefree, but escape our sieving by primes less than $w$. These are the guys such that at least one of the $n+h_i$ have a divisor $p^2$ with $p>w$ prime. Now there are are at most $x/p^2$ multiples of $p^2$ in $[x]$. So at most
$\displaystyle{ \sum_{w
should be removed. Let’s try to balance the error terms $O(x/w)$ and $O(W^2)=\exp(2w(1+o(1))$; take $w=\frac{1}{2}(\log x-\log\log x)$.
This way both seem to be $O(x/\log x)$.

Green-Tao type theorem

We can also easily get asymptotics of the form

$\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)(\prod_p\beta_p+o(1))}$

where $K\subset [0,N]^d$ is a convex body and $\Psi$ a system of affine-linear forms of which no two are affinely related. Moreover $\beta_p$ is again the proportion of non-forbidden vectors $a=(a_1,\ldots,a_d)$ of residues modulo $p^2$. One way to do it would be to simply use the Hardy-Littlewood type asymptotics proved above coordinate by coordinate. We can also prove it quite easily, by first observing that the set of n where at least one of the $\psi_i$ vanishes has size $O(N^{d-1})$ because the 0-set of an affine-linear form is an affine hyperplane. Then as $\psi_i(n)=O(N)$, the divisors $a_i$ will have to satisfy $a_i\ll \sqrt{N}$. So now we restrict the set of $n$ to the ones where the forms don’t vanish. Then we partition the sum into one where $a_1$ is restricted to be at most $N^{\delta}$ (we fix some small $\delta >0$), and the remaining one. Using the well-known fact that $\mu^2(n)=\sum_{d^2\mid n}\mu(d)$, we remark that the remaining one equals

$\displaystyle{ \sum_{N^{\delta}\leq a_1\ll \sqrt{N}}\mu(a_1)\sum_{n:a_1^2\mid \psi_1(n)}\mu^2(\psi_2(n))\cdots\mu^2(\psi_t(n))}$

Now the number of $n$ in the sum is $O(N^d/a_1^2+N^{d-1})$. Given that $\sum_{N^{\delta}\leq a_1} a_1^{-2}\ll N^{-\delta}$, we can bound this sum by $O(N^{d-\delta})$. We proceed similarly for $i=2,\ldots,t$, so that the sum to estimate is

$\displaystyle{ \sum_{a_1,\ldots,a_t\leq x^\delta}\prod_i\mu(a_i)\lvert\{n\in\mathbb{Z}^d\cap K\mid \forall i \quad \psi_i(n)\equiv 0\mod a_i^2\}\rvert}$

Quite gross volume packing arguments indicate that the number of integral points (lattice points) to estimate above is $\text{Vol}(K)\alpha_{\Psi}(a_1^2,\ldots,a_t^2)+O(N^{d-1+2t\delta})$, where

$\displaystyle{ \alpha_{\Psi}(k_1,\ldots,k_t)=\mathbb{E}_{n\in [\text{lcm}(k_i)]^d}\prod_i1_{k_i\mid\psi_i(n)}}$

is the density of zeroes modulo the $a_i^2$. Moreover we can extend the sum beyond $N^\delta$ at the cost of a mere $N^{O(\delta)}$. Hence finally for any $\epsilon >0$ we can write that

$\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i\in[t]}\mu^2(\psi_i(t))=\text{Vol}(K)\prod_p\beta_p+o_\epsilon(N^{d-1+\epsilon}).}$

Indeed by multiplicativity it is easy to transform $\sum_{a_1,\ldots,a_t}\alpha_{\Psi}(a_1^2,\ldots,a_t^2)\prod_i\mu(a_i)$ into a product over primes.

The quest for repeated, infinitely frequent patterns in the primes, is certainly a very old one, and is often even a quest for the asymptotic frequency of these patterns, which is much harder. For instance proving that there are infinitely many primes is easy, finding a satisfactory answer for the question « How many (roughly) until N large ? » much less so. Inbetween lies the obtention of upper and lower bounds of the right shape, due by Chebyshev.

Here pattern stands for images of a system of polynomial forms. Given $\Psi=(\psi_1,\cdots,\psi_t)\in(\mathbb{Z}[X_1,\cdots,X_d])^t$ and a convex set $K=K_N\subset\mathbb{R}^d$ of volume increasing to infinity, we are thus interested in evaluating

$\displaystyle{\sum_{n\in K\cap\mathbb{Z}^d}\prod_{i=1}^t\Lambda(\psi_i(n))}$

for N large, where $\Lambda$ is the von Mangoldt function. The case where $d=1$ and $\psi_i(n)=n+b_i$ is the original question of Hardy and Littlewood, who proposed a tantalizing asymptotic behaviour but is still completely out of reach (even the question whether there are infinitely many $n$ such that $n,n+2$ are both primes is not settled). But the case where the system $\Psi$ is affine-linear (thus the polynomials are all of degree 1) and no two forms are affinely dependent was solved by Green and Tao in the celebrated article Linear equations in primes.

Similar results for more general polynomial forms are rare. We have to mention the famous work of Friedlander and Iwaniec yielding an asymptotic for the number of primes of the form $p=x^2+y^4$, where it appears that

$\displaystyle{\sum_{x\leq\sqrt{N},y\leq N^{1/4}}\Lambda(x^2+y^4)}\sim Cx^{3/4}$

for some constant $C>0$.

I have uploaded yesterday an article on the ArXiv which provides asymptotics of the same shape as the ones in the Hardy-Littlewood for a few exceptional polynomial patterns.  Thus for instance, I can tell how many arithmetic progressions of three primes smaller than N exist whose common difference is a sum of two squares – well not quite, because I have to weigh these arithmetic progressions by the number of representations of the common difference. Now this weight, giving a positive density to the set of sums of two squares, which is sparse, of density $N/\sqrt{\log N}$, just as the von Mangoldt function is a weight (mostly) on primes giving them a density, cannot be easily eliminated afterwards, in contrast to the von Mangoldt function (one can write for $n\leq N$ that $\Lambda(n)\sim 1_\mathbb{P}(n)\log n\sim 1_\mathbb{P}(n)\log N$).

More precisely, the result that naturally comes out concerning three term arithmetic progressions with common difference a sum of two squares is

$\displaystyle{\sum_{1\leq a\leq a+2d\leq N}\Lambda(a)\Lambda(a+d)\Lambda(a+2d)R(d)=\pi N^2/4\prod_p\beta_p+o(N^2)}$

where $R(n)=\mid\{(x,y)\in\mathbb{Z}^2\mid n=x^2+y^2\}\mid$ is the representation function and $\beta_p$ are some explicit constant which I don’t reproduce here. Moreover, we can generalise to other positive definite binary quadratic forms than this one, and there’s nothing special about length three: an asymptotic is available for any length. Here we notice that in some sense, the result is only seemingly polynomial, and truly rather linear: the polynomial nature of the pattern is enclosed in a linear input into the representation function of a quadratic form.

In fact, my article contains a more general result of which the one above is only a particular case. My work consisted in mingling the von Mangoldt function with the representation functions of quadratic forms, whose behaviour on linear systems have been already analysed respectively in by Green and Tao and Matthiesen. The idea is to consider sums of the form

$\displaystyle{\sum_{n\in\mathbb{Z}^d\cap K}\prod_{i=1}^tF_i(\psi_i(n))}$

where $F_i$ can be the von Mangoldt function or a representation function, and the $\psi_i$ are linear forms. The cohabitation of both types of functions went quite well. One delicate point was to eliminate biases modulo small primes of both types functions, an operation known as the $W$-trick. The difficulty is that while the value of the von Mangoldt function is more or less determined by the coprimality to small primes, it is not so for the representation function, which is also sensitive to the residue modulo large powers of small primes. Once this issue is adressed carefully, it is possible to majorize them by one and the same pseudorandom majorant, which opens the way to the application of the transference principle.

Similarly, the cohabitation between the von Mangoldt function and the divisor function is quite natural, yielding asymptotics for expressions such as $\sum\Lambda(n)\Lambda(n+ab)\Lambda(n+2ab)=\sum\Lambda(n)\Lambda(n+d)\Lambda(n+2d)\tau(d)$. This is reminiscent of the Titchmarsh divisor problem, the evaluation of $\sum_n\Lambda(n)\tau(n+a)$ or (almost equivalently) of $\sum_p\tau(p+a)$, but the latter expression involves a linear system of infinite complexity, and is thus altogether out of reach of my method, just as the twin primes or the basic Hardy-Littlewood conjecture.

One may hope to extend the generalised Hardy-Littlewood conjecture stated (and proven in the finite complexity case in the paper linked) by Green and Tao to polynomial systems. For example given a polynomial $\displaystyle{\phi \in \mathbb{Z}[x_1,\ldots,x_d]}$ and a bounded domain $K\subset \mathbb{R}^d$ (probably a convex body would be more reasonable), one may be interested in an asymptotic for

$\displaystyle{\lvert\{x=(x_1,\ldots,x_d)\in K\cap \mathbb{Z}^d\mid \phi(x)\text{ is prime}\}\rvert.}$

We will rather try to get an asymptotic of the form

$\displaystyle{\sum_{(x_1,\ldots,x_d)\in\mathbb{Z}^d\cap K}\Lambda(\phi(x_1,\ldots,x_d))=\beta_{\infty}\prod_p\beta_p+o(\beta_{\infty})}$

where $\beta_{\infty}$ is basically the number of points with positive integer coordinates in $K$, so hopefully $\beta_{\infty}=\text{Vol}(K\cap \phi^{-1}(\mathbb{R}_+))$, and the local factors take into account the obstructions or the absence of obstructions to primality modulo $p$. Recall that $\Lambda$ classically denotes the von Mangoldt function. There are only very few non-linear polynomials for which an asymptotic for the number of prime values is available. The easiest one is obviously $\phi(x,y)=x^2+y^2$. Indeed, the primes which are sum of two squares are the primes congruent to 1 modulo 4 (and also the prime 2, but it’s a single prime, so we don’t have to care), and they are represented 8 times each as a value of $\phi$. So

$\displaystyle{\sum_{x^2+y^2\leq N}\Lambda(x^2+y^2)\sim\sum_{n\leq N,n\equiv 1\text{ mod } 4}\Lambda(n)\sim 4N.}$

Now let’s check what the conjecture would say. Here $\beta_{\infty}=\text{Vol}(\{(x,y)\mid x^2+y^2\leq N\})\sim \pi N$. What about the $\beta_p$ ? They are supposed to be $\beta_p=\frac{p}{p-1}\mathbb{P}(\phi(x,y)\neq 0\text{ mod } p)$. Now it easy to check that $\mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{2p-1}{p^2}$ if $p\equiv 1\text{ mod } 4$, and $\mathbb{P}(\phi(x,y)\equiv 0\text{ mod } p)=\frac{1}{p^2}$ otherwise. So $\beta_p=\frac{p-1}{p}$ in the former case, and $\beta_p=\frac{p+1}{p}$ in the latter case. Thus, $\prod_p\beta_p$ doesn’t converge absolutely, in contrast with the traditional Green-Tao situation, where $\beta_p=1+O(1/p^2)$… However, if we imagine that to each prime $p\equiv 1\text{ mod } 4$ corresponds a prime $p'\equiv 3 \text{ mod } 4$ with $p\approx p'$, we could compute the product of the $\beta_p$ by grouping the factors into pairs $\beta_p\times\beta_{p'}\approx 1-1/p^2$. A bit more precisely,

$\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1+1/p)\sim C\sqrt{N}}$

and

$\displaystyle{\prod_{p\equiv 1\text{ mod } 4,p\leq N}(1-1/p)\sim C'/\sqrt{N}}$

which implies that

$\displaystyle{\prod_{p\leq N}\beta_p}\sim CC'$

so this product is convergent, although not absolutely. I guess the constant $C'$ is $e^{-\gamma/2}$, see Mertens theorem, and that $C=e^{\gamma/2}\prod_{p\equiv 1\text{ mod } 4}(1-p^{-2})$. I don’t really know how to compute this convergent product. We quickly notice that $\beta_2=1$. If the product could be equal to $4/\pi$, which it can’t unfortunately, being smaller than 1, we would apparently get the correct result, but this remains a quite dodgy case, which should make one cautious about stating ambitious generalisations of the Hardy-Littlewood conjecture.

When one considers the sequence of values $f(n)$ of some reasonable arithmetic function, like the divisor function $\tau$, or the function $\omega(n)$ the number of prime factors of $n$, one may notice that excluding a few abnormal values, the sequence looks very much like a neat, recognizable sequence. Or at least, smoothing the sequence by averaging it produces a sequence which looks neat.

Thus $\omega(n)$ has normal order $\log\log n$: almost all numbers $n$ have about $\log\log n$ ; this is the theorem of Hardy-Ramanujan. It’s even better than this : for almost all numbers, $\omega(n)$ is within $(\log\log n)^{1/2+\epsilon}$ of its model $\log\log n$. In fact the values of $\frac{\omega(n)-\log\log n}{\sqrt{\log\log n}}$ follow a Gaussian distribution, as the Erdös-Kac theorem reveals.

In contrary the divisor function $\tau$ doesn’t have any normal order, but it does have an average order which is quite good-looking, namely $\log n$ (while $\log\tau$ has a normal order which, surprisingly isn’t $\log\log n$ but a bit a less, namely $\log 2\log\log n$, showing that a few exceptionally highly divisble numbers are able to make the average deviate substantially).

Now of course on the primes $\tau$ and $\omega$ collapse violently, being stuck at respectively 2 and 1. The question is whether just after such a shocking value, at $n=p+1$, one recovers more normal arithmetic indices, whether there are traces of the past catastrophe (or signs of the coming castrophe just before it).

Of course $p+1$ isn’t absolutely any number, for instance it’s surely even; but then is $(p+1)/2$ just any number? It must also be said that it has higher chances to be mutiple of 3 than a generic number, in fact one in two chance instead of one in three, because $p+1\equiv 1\text{ or } 0\text{ mod }3$ whith equal probability.

From the point of view of $\omega$

There the answer is: yes, perfectly. Indeed, Heini Halberstam established that the Erdös-Kac theorem holds when $n$ is constrained to range in the shifted primes $p+1$. That is, $\omega(p+1)$ lies within $M\sqrt{\log\log p}$ of $\log\log p$ for a positive proportion $p(M)$ of primes, the density being again gaussian.

From the point of view of $\tau$

Not quite. For this consider the Titchmarsh divisor problem, consisting in estimating $\sum_{p\leq x} \tau(p+1)$. If $\tau(p+1)$ was replaceable by $\log(p+1)$ as we suppose, then this sum would be asymptotic to $\sum_{p\leq x}\log p\sim x$ by Mertens formula. It turns out that it is in fact asymptotic to $\frac{\zeta(2)\zeta(3)}{\zeta(6)}x$, the constant prefactor being well over 1,9, so that it can be said that $p+1$ has almost twice as many divisors as banal numbers of his size. Now remember it’s always even; now there are good heuristic reasons to believe that $\sum_{n\leq x,n\equiv [0] 2}\tau(n)\equiv 2/3x\log x$, so that in average $\tau(2n)$ is close to $4/3\log n$, but 1,9 is still much larger sensibly larger than 4/3. Here the higher chances of $p+1$ to be divisible by 3, 5 etc. in comparison to a banal number weigh enough to make the average deviate.

It would be interesting to determine whether $\log \tau(p+1)$ has a normal order, in the same vein as Halberstam’s result.

Another criterion of banality: the probability of being squarefree

As we know, the density of squarefree numbers is $1/\zeta(2)=\prod_{q \text{ prime}}(1-q^{-2})$. It is then reasonable to wonder if among numbers of the forme $n=p+1$, the proportion of square free is the same. It’s clear to see that it can’t quite be so: indeed, $p+1$ has one on two chance of being divisible by the square 4 (while a generic number has of course one on four chance), one on six chances of being divisible by 9, one on $q(q-1)$ chance of being divisibly by the prime $q$. So one guesses that $p+1$ is a little less likely to be squarefree than any number; indeed, it’s been proven by Mirsky that the proportion of squarefree numbers among numbers of the form $p+1$ is $\prod_q (1-1/(q(q-1))$, to compare with $\prod_{q \text{ prime}}(1-q^{-2})$.

The property of being squarefree can be replaced by the property of being $k$-free for instance, the density among shifted primes being then $\prod_q (1-1/(q^{k-1}(q-1))$, to compare with $\prod_{q \text{ prime}}(1-q^{-k})$.

Appendix: average of the divisor function on odd and even numbers

We know that

$\displaystyle{x\log x\sim\sum_{n\leq x}\tau(n)=\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)+\sum_{\substack{n\equiv 0\mod [2]\\n\leq x}}\tau(n)=S_0(x)+S_1(x)}$

And let us suppose that $\sum_{n\equiv 0\mod [2],n\leq x}\tau(n)\sim Ax\log x$ and $\sum_{n\equiv 1\mod [2],n\leq x}\tau(n)\sim Bx\log x$.

Now the sum on even numbers can be decomposed into a sum on numbers of 2-adic valuation 1,2,3… But numbers of 2-adic valuation k are numbers which are of the form 2^km with $m$ odd, so

$\displaystyle{\sum_{v_2(n)=k,n\leq x}\tau(n)=(k+1)\sum_{n\leq x/2^k,n\equiv 1[2]}\tau(n)\sim(k+1)Bx/2^k\log x}$ for fixed $k$. Pretending this asymptotic also holds when $k$ is allowed to grow with $x$, we infer from $S_0(x)=2S_1(x/2)+(3-2)S_1(x/4)+(4-3)S_1(x/8)\cdots$ that $C=D+D/2+D/4\cdots=2D$. Hence $S_0(x)\sim 2x/3\log x$ and $S_1(x)\sim x/3\log x$.

It might be that this intuition has been proven more rigorously, I’d like to see where!

The genuine prime number theorem says that there are about $N/\log N$ prime numbers between 1 and N. The prime number theorem in function fields is rather about the number $a_n$ of irreducible monic polynomials of degree $n$ in the polynomial ring $A=\mathbb{F}_q[t]$ on a finite field $\mathbb{F}_q$. It is remarkable that the result is of the same shape $q^n/n$, which is $N/\log N$ if you let $N=q^n$ be the number of monic polynomials of degree $n$. But the proof, as often with the polynomial analogues of number theory, is much neater and even yields a nice exact formula for $a_n$

$\displaystyle{q^n=\sum_{d\mid n}da_d\text{ and by Moebius inversion } a_n=\frac{1}{n}\sum_{d\mid n}\mu(d)q^{n/d}}$

First proof

This one requires a good deal of knowledge of the theory finite fields.

We prove that $\Phi=T^{q^n}-T$ is the product $\Psi$ of all monic irreducible polynomials whose degree divide $n$, which immediately provides the above formula by comparison of the degrees. To prove this, notice that the roots of $\Phi$ are all the elements of the extension $\mathbb{F}_{q^n}$ of $\mathbb{F}_q$, so that $\Phi$, as an element of $\mathbb{F}_{q^n}[t]$ can be factored as $\prod_{\alpha\in \mathbb{F}_{q^n}}(T-\alpha)$. On the other hand, if $P$ is an irreducible polynomial of degree $m\mid n$, it splits into linear factors in $A/(P)\sim \mathbb{F}_{q^m}\leq \mathbb{F}_{q^n}$. So $P$ has a root $\alpha\in \mathbb{F}_{q^n}$ and all its other roots are images of $\alpha\in \mathbb{F}_{q^n}$ under the Frobenius element, more precisely,

$P=(T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})$

so that $P\mid \Phi$ (at first sight in $\mathbb{F}_{q^n}[t]$, but also in $A$ because they are both in $A$ – it is well known that the gcd is preserved under field extensions). Moreover, all monic irreducible $P$ are coprimes, so that their product still divides $\Phi$. Finally, we need to prove the converse divisibility, either in $A$ or in $\mathbb{F}_{q^n}[t]$, which is equivalent. So let’s do it in the latter; take a factor $T-\alpha$ with $\alpha \in\mathbb{F}_{q^n}$. Let $m\mid n$ be the order of the Frobenius element on $\alpha$, so that $\alpha^{q^m}=1$ and no smaller integer has this property; then $(T-\alpha)\mid (T-\alpha)(T-\alpha^{q^{n/m}})\cdots (T-\alpha^{q^{n(m-1)/m}})\in A$ which is indeed an irreducible of degree $m\mid n$, hence a factor of $\Psi$. Given the factorization of $\Phi$ and the fact that the factors $T-\alpha$ are coprime, their product $\Phi$ still divides $\Psi$

Second proof

This one proves exactly what we want and not more and uses only power series. We need the notation $\vert f\vert=q^{\text{deg }f}$.

Let

$\displaystyle{\zeta_A(s)=\prod_{P\in A\text{ monic irreducible}}(1-\vert P\vert ^{-s})^{-1}=\sum_{f\in A\text{ monic}}\vert f\vert^{-s}}$

be the zeta function (for s of real part >1). It is easy to see that $\zeta_A=(1-q^{1-s})^{-1}$ by looking at the sum expression, and $\zeta_A=\prod_{d\geq 1}(1-q^{-ds})^{-a_d}$ by looking at the product expression. Putting $u=q^{-s}$, both sides become nice power series so that we get the equality

$\displaystyle{\frac{1}{1-qu}=\prod_{d=1}^{+\infty}(1-u^d)^{-a_d}}$

and doing classical operations (logarithmic derivation and expansion in power series followed by termwise identification in particular) on both sides yield the result.

This proof seems to use about zero input of algebra, so its success looks like a miracle. The only step where a (tiny) bit of knowledge seems to be used is the transformation of the sum into an Euler product – it needs the existence of unicity of the decomposition of a monic polynomial into product of irreducibles.

A kind of sieve method ?

It could be reasonable to hope that the sieve of Eratosthenes would work better in the realm of polynomial rings. We’ll try to adapt it here. Notice that in this case we have count primes up to a certain large degree, and not of exactly certain large degree (both quantities are far from equivalent in fact !).

Let’s define $\pi(n)$ to be the number of monic irreducible (let’s say primes) polynomials of degree at most n, and $\delta(m)=1_{m=1}$. We also use Moebius’ function for monic polynomials, also defined in terms of the number of prime factors. The Moebius inversion formula $\delta=\mu\star 1$ still holds in the polynomial world. We also write

$\displaystyle{\Phi=\prod_{P \text{mon. irred,deg} P\leq n/2}P}$

Thus

$\displaystyle{\pi(n)-\pi(n/2)+1=\sum_{f\text{ monic of deg}\leq d}\delta((f,\Phi))=\sum_{d\mid\Phi}q^{n-\mathrm{deg}d}\mu(d)1_{\mathrm{deg} d\leq n}}$

and if we forget about $1_{\mathrm{deg} d\leq n}$ we obtain exactly

$\displaystyle{q^n\prod_{P \text{mon. irred,deg} P\leq n/2}(1-\vert P\vert ^{-1})}$

this looks quite nice, but the error term, just like in the integer case, can’t easily be bounded by anything better than $O(2^{\pi(n/2)})$ which we know will be vastly bigger than the main term. By the way this product can very easily be shown to be $\ll 1/n$ but the exact asymptotic is not any easier than the PNT itself… So this sieve seems to have very little interest, given the convenience of the other methods.

Two posts in french have already appeared on this topic, showing that the maximal number of unit vectors having pairwise negative (resp. non-positive) inner products grow linearly wit the dimension of the space $\mathbb{R}^n$ they live in. This is good fun; however, as mentioned in a comment, it becomes even better fun when one bothers asking, for $\epsilon >0$ how big a family of unit vectors $v_1,\cdots,v_k\in \mathbb{R}^n$ can be if…

1. … for all $i\neq j$, one requires $v_i\cdot v_j<\epsilon$;
2. … for all $i\neq j$, one requires $v_i\cdot v_j<-\epsilon$;

Let’s deal with the first question. This involves the remarkable fact that on the sphere of very large dimension, almost all the area is concentrated on a small band near the equator (or any equator). This is the phenomenon called concentration of measure ; so pick a vector $v_1\in\mathbb{S}^n$. To continue your family, you have to pick a vector outside the spherical ball $B=\{x\mid x\cdot v_1\geq\epsilon\}$. So the concentration of measure implies that the forbidden area is of order $C\exp(-cn\epsilon^2)$. When you have picked already $k$ vectors, the forbidden area for the choice of the next one is at most $kC\exp(-cn\epsilon^2)$ which is strictly less than 1 until $k=C^{-1}\exp(cn\epsilon^2)$. Then not all the sphere is forbidden, so you can accommodate certainly $C^{-1}\exp(cn\epsilon^2)$: exponentially (in the dimension) many instead of linearly.

The second one is easier. Indeed,

$0\leq \lVert \sum_{i=1}^k v_i \rVert^2=k+2\sum_{i

which implies $k\leq \epsilon^{-1}+1$ so we get a constant bound, certainly much lower than $n+1$ for large dimensions.