Do binary symmetric channels maximize mutual information?

No. You can get a higher $I(U;V)$ using asymmetric channels. Below I construct a counterexample, but first a more succinct restatement of the question.


Restatement

To summarize, there is an input $U$ distorted by three independent binary hops, each described by $2\times 2$ stochastic matrices $C_L,\ C,\ C_R.$ Labeling all your RV's, $$U\overset{C_L}{\to}X\overset{C}{\to}Y\overset{C_R}{\to}V.$$ We are interested in maximizing $I(U;V)$ subject to constraints:

  • $C$ is determined by nature.
  • $C_L$ is such that $I(U;X) \leq r_L$,
  • $C_R$ is such that $I(Y;V) \leq r_R$,
  • $U$'s distribution is such that the left channel's output (i.e. $X$) is $B(1/2)$.

Counterexample

You claim the $C_L,C_R$ which produce the maximum are binary symmetric.

  • If $C_L$ is a BSC with $B(1/2)$ output then its input must also be $B(1/2).$ For a given rate $r_L \leq 1$, then there is at most one 'positive' (i.e. can't be improved by relabeling the outputs) BSC $C_L$ whose output is $B(1/2).$
  • You have assumed $C$ is a BSC, so with a symmetric input its output is also symmetric.
  • For a rate $r_R\leq 1$ there is only one positive choice for $C_R.$

So to say they are binary symmetric is to determine all of $U,C_L$ and $C_R$.

Now take $C$ a perfect channel, $C= \left[\begin{smallmatrix} 1 & 0 \\ 0 & 1 \end{smallmatrix}\right]$ and $r_L=r_R=0.4.$ The associated positive BSC for this rate has crossover probability $\approx 0.15,$ and the end-to-end mutual information can be computed: $$I(U_{BSC}, V_{BSC})< 0.1895$$

However, trying randomly[1] you can find $U^\ast, C_L^\ast, C_R^\ast$ that satisfy all the mutual information constraints, but have greater $U$-to-$V$ mutual information. One I found happens to be quite close to a Z-channel:

\begin{equation} I(U^\ast; V^\ast) > 0.19, \end{equation} \begin{equation} C_L^\ast \approx \left[\begin{smallmatrix}0.2493 & 0.7507 \\ 0.9657 & 0.0343 \end{smallmatrix}\right], \qquad C_R^\ast \approx \left[\begin{smallmatrix} 0.9821 & 0.0179 \\ 0.3374 & 0.6626 \end{smallmatrix}\right], \qquad U^\ast \sim B(0.35) \end{equation}


Discussion

This result is to be expected since there is a vague sense that uniform noise over a bounded space is the most degrading, even holding mutual information fixed. (by one heuristic "uniform noise means you can't precode to mitigate it")

A gentle introduction for a good, visualisable framework for studying binary symmetric channels is given in a short paper, Algebraic Information Theory for Binary Channels by Martin, Moskowitz and Allwein. Under this framework your maximization can be restated as a convex optimization problem for which I see no easy special cases.

An easier-to-investigate (and arguably more interesting) problem is one identical to yours that omits the fourth constraint that $X\sim B(1/2)$. But I could not find an easy path towards an answer for this either.

For both of these there might be some magical connection to KL divergence which I am not seeing.


Code

[1]: Below is a crude counterexample finder.

% Helper functions
    % Binary entropy
fn_h = @(p) -p.*log2(p) - (1-p).*log2(1-p); 
    % MI across mtx_bc when v_distn is input
fn_I = @(mtx_bc,v_distn) fn_h(v_distn(1)) + fn_h(v_distn*mtx_bc(:,1)) ...
    - nansum(nansum(-log2(diag(v_distn)*mtx_bc).*(diag(v_distn)*mtx_bc)));
    % Channel matrix when P(out=0|in=0)=pa, P(out=0|in=1)=pb
fn_mtxBC = @(pa,pb) [pa, 1-pa; pb, 1-pb];

% Set params
d_r_L = 0.4; 
d_r_R = 0.4;
d_xp = 0.146102; % solution to 1-H(p) = 0.4
mtxBSC = fn_mtxBC(d_xp, 1-d_xp);

% Search 
while true
    mtxL = fn_mtxBC(rand, rand);
    mtxR = fn_mtxBC(rand, rand);
    v_d = (mtxL'\[0.5, 0.5]')';
    if (abs(sum(v_d)-1) > 0.001 || ...
        min(v_d) < 0)
        continue
    end
    if(fn_I(mtxL, v_d)      > 0.4 || ...
       fn_I(mtxR, v_d*mtxL) > 0.4)
        continue;
    end
    fprintf('+\n');
    if fn_I(mtxL*mtxR, v_d) > fn_I(mtxBSC*mtxBSC, [0.5, 0.5])
        break;
    end
end