How exactly does the proof of Bell's theorem fail if you remove the locality assumption?

In my derivation, I make my error at equation $(2)$, attempting to extend the logic employed by Bell in arriving at equation $(1)$.

Bell's local derivation uses the assumption that the system being observed is in an anticorrelated state to obtain the equality

\begin{equation} A(\mathbf{a}, \lambda) = -B(\mathbf{a}, \lambda), \end{equation}

in which $\mathbf{a}$ represents a specific choice of measurement angle. However, there is no dependence on another angle $\mathbf{b}$ in the above, and so it is just as general as writing the equality

\begin{equation} A(\mathbf{\beta}, \lambda) = -B(\mathbf{\beta}, \lambda). \end{equation}

This allows us to obtain expression $(1)$:

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = -\int \rho A(\mathbf{\alpha}, \lambda) A(\mathbf{\beta}, \lambda)\, d\lambda. \end{equation}

In the nonlocal derivation, however, $A = A(\mathbf{\alpha}, \mathbf{\beta}, \lambda)$ and $B = B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)$ have nonlocal dependence on two angles, not just one. The assumption of the singlet state gives us

\begin{equation} A(\mathbf{a}, \mathbf{a}, \lambda) = -B(\mathbf{a}, \mathbf{a}, \lambda). \end{equation}

In the above, $A$ and $B$ are equal when Alice and Bob choose the same measurement angle, or when $\mathbf{\alpha} = \mathbf{\beta}$, and so the above can be written

\begin{equation} A(\mathbf{\beta}, \mathbf{\beta}, \lambda) = -B(\mathbf{\beta}, \mathbf{\beta}, \lambda) \neq B(\mathbf{\beta}, \mathbf{\alpha}, \lambda). \end{equation}

It is important to note that, because $A$ ande $B$ depend on two angles, the relationship above is only true when the two angles are the same. In the expression $P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda$, $-A(\mathbf{\beta}, \mathbf{\beta}, \lambda)$ cannot be substituted to obtain expression $(2)$:

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda \neq -\int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) A(\mathbf{\beta}, \mathbf{\beta}, \lambda)\, d\lambda. \end{equation}

This inability to rewrite $P(\mathbf{\alpha}, \mathbf{\beta})$ for the singlet state halts the nonlocal derivation if attempting to apply the same steps as Bell in his local derivation.

Side note: This does not prove that another approach could not render a Bell's inequality with the assumption of nonlocality, but proving that was not my purpose.


Despite this, both ∫ρ(A(a,c,λ)A(c,a,λ)A(a,b,λ)A(b,a,λ))dλ and P(b,c) are restricted to the range −1≤x≤1, so both inequalities should lead to the same experimental conclusions regarding local realism.

This does not follow. The fact that the correlation must obey $-1<x<1$ is less restrictive than Bell's inequalities. As far as I can tell you didn't make any mistake, the first derivation is the correlation that a local theory must obey, and the second has a term you can't reduce to $P(b, c)$ which allows it to violate the inequality.

In any case, I found Bell's original derivation hard to follow until I understood the inequality another way. He also misuses conditional probabilities as noted by E.T. Jaynes (though I think the error is ultimately not fatal).

I offer the following derivation if you wish to use it for understanding Bell more clearly in hindsight. Consider three ordered lists containing elements $-1$ and $1$.

$$a=\{1,1, -1 ... \}$$ $$b=\{1,-1, 1, ... \}$$ $$c=\{-1,1, 1, ... \}$$

Denoting elements $a_i$, $b_i$ and $c_i$, we have:

$$a_ib_i-a_ic_i=a_ib_i-a_ic_i$$ Since $b_i^2=1$: $$\implies a_ib_i-a_ic_i=a_ib_i(1-b_ic_i)$$ $$\implies |a_ib_i-a_ic_i|=|1-b_ic_i|$$ Since the RHS is never negative, we may drop the absolute value: $$\implies |a_ib_i-a_ic_i|=1-b_ic_i$$ Now by summing over the terms and denoting: $$\langle ab\rangle =\frac{1}{N}\sum_{i=1}^N a_ib_i$$ And using the fact that: $$\sum_{i=1}^N|A_i|\geq\left|\sum_{i=1}^NA_i\right|$$ We obtain: $$|\langle ab\rangle -\langle ac\rangle|\leq 1 - \langle bc \rangle$$

This is an identity. If you give any three lists for $a$, $b$, and $c$, this inequality always hold.

However, you might find yourself in the peculiar situation that you can only sample two of these lists at a time for any element $i$. Now there is a chance it will be violated, but if we assume that those three lists of numbers exist in principle (a.k.a. hypothetical measurements), then violations can only happen up to statistical fluctuations of the order $\sim 1/\sqrt{N}$.

QM violates this, which apparently means that those three lists don't exist, even in principle.

Of course, these lists also might not exist for variables that somehow communicate with each-other, or if the system knew ahead of time which we were going to measure and conspired against us. In both these cases if we measure $a$ and $b$ we can't talk about what $c$ would have been because what we measure plays an active role in the outcome.