Lower bounding decoding error in a noisy adversarial channel

Consider the special case where $|\mathcal X| = 2$ and $k=1$, that is, if you pick $P_1$ the adversary picks $P_2$ and generates a sample $Y^n = (Y_1,\dots,Y_n)$ drawn i.i.d. from $P_2$, and vice versa. Then your question turns into the minimal error in a (Bayesian) binary hypothesis test. By Le Cam's lemma: $$ \inf_{\psi} \mathbb P\big( \psi(Y^n) \neq X\big) = \frac12\big( 1 - \| P_1^{\otimes n} - P_2^{\otimes n}\|_{\text{TV}}\big) $$ If $P_1 \neq P_2$, we have $\| P_1^{\otimes n} - P_2^{\otimes n}\|_{\text{TV}} \to 1$ as $n \to \infty$, i.e., the two product distributions eventually separate and the minimum error probability goes to zero, i.e., you can do consistent hypothesis test. (You shouldn't expect a nonzero lower bound in the limit).


For the general case, it is enough to consider how small $\mathbb P(\psi(U) \neq X)$ can be made, where $\psi : \binom{[m]}{k} \to [m]$. That is, we want the optimal decision rule $$ \psi^* = \arg\min_\psi \mathbb P(\psi(U) \neq X). $$ Here $\binom{[m]}{k}$ is the set of all subsets of $[m] = \{1,2,\dots,m\}$ of size $k$. This is a Bayesian decision problem, with a 0-1 loss. It is well-known that the optimal rule minimizes the posterior risk (see Lehaman and Casella, Theorem 1.1. in Chap 4, p.228), i.e. \begin{align} \psi^*(u) &= \arg \min_{j \in [m]} \mathbb P( j \neq X\mid U= u) \\ &= \arg \max_{j \in [m]} \mathbb P( j = X\mid U= u) \\ \end{align} We have \begin{align*} \mathbb P( j = X\mid U= u) = \begin{cases} \frac{1}{m-k} & j \notin u \\ 0 & j \in u. \end{cases} \end{align*} For any $u$, an optimal decision rule can pick any $j \notin u$ to maximize the posterior risk (the rule is not unique). Let $j(u)$ be the smallest element not in $u$. Then $\psi^*(u) = j(u)$ is an optimal rule. For this rule, we have $$ \mathbb P(\psi^*(u) = X \mid U = u) = \frac{1}{m-k}. $$ Taking the expectation (and using smoothing), it follows that $$ \mathbb P(\psi^*(U) = X) = \frac{1}{m-k}. $$ Thus, $$ \mathbb P(\psi(U) \neq X) \ge \mathbb P(\psi^*(U) \neq X) = 1-\frac1{m-k}. $$


To see why the above is enough, note that the error based on $Y^n$ is bounded by the problem where we have access to both $Y^n$ and $U$: $$ \min_\phi \mathbb P(\phi(Y^n) \neq X) \ge \min_{\widetilde\psi} \mathbb P(\widetilde\psi(Y^n,U) \neq X). $$ The optimal solution of the latter problem is again the minimizer of the posterior loss \begin{align*} \widetilde\psi^*(Y^n,U) &= \arg\min_{j \in [m]} \mathbb P( j \neq X \mid Y^n, U) \\ &=\arg\min_{j \in [m]} \mathbb P( j \neq X \mid U) \end{align*} where the second line is by the Markov property. It follows that the problem with access to $Y^n$ and $U$ has the same solution as the case where we only observe $U$ and $$ \min_{\widetilde\psi} \mathbb P\big(\widetilde \psi(Y^n,U) \neq X\big) = \min_\psi \mathbb P\big( \psi(U) \neq X\big) =1-\frac{1}{m-k}. $$


The above argument in fact shows that whenever $X \to U \to Y^n$ (i.e., $X$ is independent of $Y^n$ given $U$), then $$ \min_{\phi} \mathbb P \big(\phi(Y^n) \neq X\big) \ge \min_{\psi} \mathbb P \big(\psi(U) \neq X\big), $$ which seems to be Bayesian decision-theoretic version of the data processing inequality.