Random sequence of integers in $\{1, 2, \dots, n \}$ which is "everywhere probably increasing" - how long can it be?

I adapt an argument from this blog post of mine, exploiting the $\ell^2$ boundedness of the discrete Hilbert transform (i.e. Hilbert's inequality), to obtain an exponential upper bound. I don't see any obvious way to improve this to a polynomial bound (EDIT: A Whitney decomposition seems to do the trick, see below). The argument is inspired by a stability result of Hrushovski (based on de Finetti's theorem) which shows that some finite bound is possible, although it is not easy to extract a quantitative bound from Hrushovski's argument (and if one did, it would surely be worse than exponential); see Proposition 2.25 of that paper.

Suppose that ${\bf P}(d_j > d_i) \geq r$ for some $r>1/2$ and all $1 \le i < j \leq k$. Then of course $${\bf P}( d_j > d_i) - {\bf P}( d_j < d_i ) \geq 2r-1$$ for all $1 \le i < j \le k$. We expand this as $$ \sum_{1 \leq a < b \leq n} {\bf P}( d_j = b \wedge d_i = a ) - {\bf P}( d_j = a \wedge d_i = b ) \geq 2r-1.$$ Multiply by the positive quantity $\frac{1}{j-i}$ and sum to conclude that $$ \sum_{1 \leq a < b \leq n} \sum_{1 \leq i < j \leq k} \frac{1}{j-i} [{\bf P}( d_j = b \wedge d_i = a ) - {\bf P}( d_j = a \wedge d_i = b )] \gg (2r-1) k \log k \qquad (1).$$ The LHS can be rearranged as $$ \sum_{1 \leq a < b \leq n} \sum_{1 \leq i,j \leq k: i \neq j} \frac{1}{j-i} {\bf P}(d_j = b \wedge d_i = a ) $$ and rearranged further as $$ {\bf E} \sum_{1 \leq a < b \leq n} \sum_{1 \leq i,j \leq k: i \neq j} \frac{1}{j-i} 1_{d_j = b} 1_{d_i = a}.$$ By Hilbert's inequality, we have $$ \sum_{1 \leq i,j \leq k: i \neq j} \frac{1}{j-i} 1_{d_j = b} 1_{d_i = a} \ll (\sum_{1 \leq j \leq k} 1_{d_j=b})^{1/2} (\sum_{1 \leq i \leq k} 1_{d_i=a})^{1/2}$$ and $$ {\bf E} \sum_{1 \leq a < b \leq n} \sum_{1 \leq j \leq k} 1_{d_j=b}, {\bf E} \sum_{1 \leq a < b \leq n} \sum_{1 \leq i \leq k} 1_{d_i=a} \ll k n $$ so by Cauchy-Schwarz $$ {\bf E} \sum_{1 \leq a < b \leq n} \sum_{1 \leq i,j \leq k: i \neq j} \frac{1}{j-i} 1_{d_j = b} 1_{d_i = a} \ll kn $$ and hence $$ kn \gg (2r-1) k \log k$$ leading to the exponential upper bound $$ k \ll \exp( O( \frac{n}{2r-1} ) ).$$

EDIT: Looks like one can improve this to the polynomial bound $k \ll n^{O(1/(2r-1))}$ using the following standard Whitney decomposition trick (used for instance to prove the Rademacher-Menshov theorem or the Christ-Kiselev lemma). Firstly, without loss of generality we may take $n$ to be a power of 2. Then observe that if $1 \leq a < b \leq n$, then there is a unique pair of distinct dyadic intervals $I,J$ in $\{1,\dots,n\}$ with the same parent such that $a \in I$ and $b \in J$; let's call such pairs "adjacent". As such, the LHS of (1) can now be rearranged as

$$ {\bf E}\sum_{2^l < n} \sum_{I,J: |I|=|J|=2^l, \hbox{adjacent}} \sum_{1 \leq i,j \leq k: i \neq j} \frac{1}{j-i} 1_J(d_j) 1_I(d_i).$$

We apply Hilbert's inequality to bound this by $$ \pi {\bf E} \sum_{2^l < n} \sum_{I,J: |I|=|J|=2^l, \hbox{adjacent}} (\sum_j 1_J(d_j))^{1/2} (\sum_i 1_I(d_i))^{1/2}$$ which by Cauchy-Schwarz and the disjointness of the $I,J$ can be bounded by $$ \pi \sum_{2^l < n} k^{1/2} k^{1/2} \ll k \log n$$ leading to $$ k \log n \gg (2r-1) k \log k $$ and thus $k \ll n^{O(1/(2r-1))}$.


Using Fourier analysis in the $d$ variable, we can get the optimal upper bound. As in Tao's argument, if there is a distribution of sequences for which each pair is $r$-probably increasing, there must be a single sequence $d_i$ : such that:

$$\sum_{1\leq a <b \leq n} \sum_{1 \leq i,j \leq k, i\neq j} \frac{1}{j-i} 1_{d_i = a} 1_{d_j=b} >(1+ o(1)) (2r-1) k \log k$$

We may rewrite the left hand side as:

$$\sum_{1\leq a ,b \leq n} \sum_{1 \leq i,j \leq k, i\neq j} \frac{1}{j-i} 1_{d_i = a} 1_{d_j=b} 1_{a<b}$$

Now use the "carrying the 1" decomposition:

$$ 1_{a<b} = \frac{b}{n} - \frac{a}{n} + \frac{ a-b \operatorname{mod} n}{n}$$

The first two terms are pretty simple. The first term is:

$$\sum_{1 \leq i,j \leq k, i\neq j} \frac{1}{j-i} \frac{d_j}{n} = \sum_{1 \leq j \leq k} \frac{d_j}{n} \log \left( \frac{j}{k-j} \right) \approx k $$

and the second term is similar.

Now we've replaced $1_{a<b}$ with a convolution operator. So let's take the discrete Fourier transform:

$$\frac{1}{n}\sum_{0 \leq \xi \leq n-1} \sum_{1 \leq i,j \leq k, i\neq j} \frac{1}{j-i} e\left( \frac{d_j \xi}{n}\right) e\left( \frac{- d_i \xi}{n}\right) \left( \sum_{x=0}^{n-1} \frac{x}{n} e \left( \frac{x \xi}{n} \right)\right)$$

By the Hilbert transform inequality:

$$\left|\sum_{1 \leq i,j \leq k, i\neq j} \frac{1}{j-i} e\left( \frac{d_j \xi}{n}\right) e\left( \frac{- d_i \xi}{n}\right)\right| \leq \pi k$$

So the bound is

$$\frac{\pi k}{n}\sum_{0 \leq \xi \leq n-1}\left| \sum_{x=0}^{n-1} \frac{x}{n} e \left( \frac{x \xi}{n} \right)\right| \approx \frac{\pi k}{n} \sum_{0 \leq \xi \leq n-1} \frac{n}{2 \pi \min(\xi,n-\xi)} \approx k \log n$$

This gives:

$$k \log n > (1+o(1)) (2r-1) k \log k$$

Hence:

$$n > k^{ 2r-1 +o(1) }$$

Note that the base notation trick in the original post allows us to remove the $o(1)$ in the exponent bound by amplification.


Here is the matching lower bound:

Suppose that I have two sequences of $k$ numbers, one of which is a sequence of numbers from $1$ to $n_1$ that is increasing with probability $r_1$, and one which is a sequence from $1$ to $n_2$ that is increasing with proobability $r_2$. For any fraction $a/b$, I can construct a sequence of $k^b$ numbers from $1$ to $n_1^a n_2^{b-a}$ that is increasing with probability $\frac{a}{b} r_1 + \frac{b-a}{b}r_2$ as follows:

Write a number from $1$ to $k^b$ in base $k$ notation as $b$ numbers from $1$ to $k$. Choose randomly $a$ of the numbers and apply the first sequence, getting a number from $1$ to $n_1$. For the rest, apply the second sequence, getting a number from $1$ to $n_2$. Encode this sequence of numbers lexicographically as a single number from $1$ to $n_1^a n_2^{b-a}$ (generalizing base notation).

For any two numbers $i$ and $j$, consider the first place where they are not equal. In that place, $j$ must be larger than $i$. $d_j$ is equal to $d_i$ in all previous places. Hence as long as $d_j> d_i$ in this place, $d_j>d_i$. The probability that it is greater in this place is at least $r_1$ if this number was chosen and $r_2$ if it wasn't, so the total probability is at least $\frac{a}{b} r_1 + \frac{b-a}{b}r_2$.

This shows that we can take $k=n^{1/f(r)}$ for a convex function $f$. In particular, because we can take $f(1)=1$ (totally deterministic sequence) and $f(1/2-\epsilon)=0$ (totally random), we know $f(r) \leq 2r-1+\epsilon$, so we can get $k$ at least $n^{1/(2r-1)-\epsilon}$


An explanation of why $1/(j-i)$ is in fact the right weight function. Basically, this lower bounding method spends an equal amount of work improving the probability that $d_j>d_i$ for $j-i$ at any given scale. So the weight function should make every scale equally valuable, which $1/(j-i)$ does.