The largest Wasserstein distance to uniform distribution among all probability distributions with uniform marginals

It is an interesting question.

Actually your guess about dependence on measurement parameter $p$ is correct. The minimal Wasserstein distance between two copulas (In your case the collection of all probability distributions with same uniform marginals is a copula, which also includes the 2-d uniform distribution. Let us denote the 2-d uniform distribution by $\pi_0$ in following discussion.) actually depends on the measurement parameter $p$ of the Wasserstein distance. See Prop 1.1 of Alfonsi&Jourdain. So it is not hard to see that the maximal Wasserstein distance will also depend on $p$ using the "coarest" Fréchet–Hoeffding copula bounds on each dimension of marginals and hence the calculation of Wasserstein distance. A concrete example where $p=2$ can be found in [Cuesta-Albertos et.al].

Now come to the other part of your question that what is the maximal Wasserstein distance to $\pi_0$. Then it is equivalent to find geodesics on the submanifold determined by copula $C_{unif}$ on the probability space metricized by Wasserstein distance. This problem is generally unsolved, if you do not restrict the family of probability distributions under consideration, to my best knowledge.

One noticeable attempt is [Ambrosio et.al] whose work is also on $p=2$. If you metricized this copula, then I think you only need to find the complementary geodesic in a circular neighborhood of $\pi_0$(geodesics in a circular neighborhood of $pi_0$ correspond to the distributions possessing the minimal Wasserstein ($L^2$) distances to $\pi_0$ ) Again for general case $p\neq 2$ I am also interested in knowing more.

One more comment is that Wasserstein distance is a measure of dissimilarity, and thus we usually talk about its minimization instead of maximization. OP seems asking a bound on Wasserstein distance for a general family. As you said in the comment, if the motivation is only a convex optimization problem, I was wondering if it could be re-phrase into a minimization problem by some sort of duality.

Reference

[Alfonsi&Jourdain]Alfonsi, Aurélien, and Benjamin Jourdain. "A remark on the optimal transport between two probability measures sharing the same copula." Statistics & Probability Letters 84 (2014): 131-134.

[Cuesta-Albertos et.al]Cuesta-Albertos, Juan A., Carlos Matrán Bea, and Jesús M. Rodríguez Rodríguez. "Shape of a distribution through the L2-Wasserstein distance." Distributions With Given Marginals and Statistical Modelling. Springer Netherlands, 2002. 51-61.

[Ambrosio et.al]Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savaré. "Gradient flows with metric and differentiable structures, and applications to the Wasserstein space." Atti della Accademia Nazionale dei Lincei. Classe di Scienze Fisiche, Matematiche e Naturali. Rendiconti Lincei. Matematica e Applicazioni 15.3-4 (2004): 327-343.


I think I have an answer for the case p = 1, K = 2. I write "I think" because my computation does not coincide with the example values for $N=4$ posted earlier by OP in a comment, but I really cannot find any error in my proof, so I wanted to share it.

As already mentioned, we only need to consider permutation measures of the form $\mu = \sum_{i=1}^N \frac{1}{N} \delta_{i,\sigma(i)}$, since these are the extremal points of the set we optimize over.

For any such $\mu$, we can define a coupling $\pi$ by $\pi_{(i,\sigma(i)),(i,j)} = 1/N^2$ for $i,j \in \{1,...,N\}$ and $\pi_{(i,j),(i',j')} = 0$, if $i,j,i',j'$ are not of the form as before. That this is an admissible coupling in the definition of $W_1$ is trivial. The corresponding "value" is $$ \sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \pi_{(i,j),(i',j')} = \sum_{1 \leq i,j' \leq N} |\sigma(i) - j'| \frac{1}{N^2}. $$ We further show that this value is minimal if $\mu$ is the monotonic or comonotonic measure, which will yield the claim. I only show this for the monotonic case, $\mu = \frac{1}{N}\sum_{i=1}^N \delta_{i,i}$. Then for any admissible coupling $\hat{\pi}$, by the constraints in the calculation of the Wasserstein distances, we see \begin{align*} \sum_{1\leq i,j,i',j' \leq N} ||(i,j)-(i',j')||_1 \hat{\pi}_{(i,j),(i',j')} =& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i,i)-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')} \\ \geq& \sum_{1\leq i',j' \leq N} \sum_{1 \leq i \leq N} ||(i',i')-(i',j')||_1 \hat{\pi}_{(i,i),(i',j')}\\ =&\sum_{1\leq i',j' \leq N} |i'-j'| \frac{1}{N^2}, \end{align*} where the inequality is elementwise and the last term corresponds to the value for the coupling $\pi$.

Note that this only shows that monotonic/comonotonic measures are maximizers, but not that these are the only maximizers.