In QFT, why does a vanishing commutator ensure causality?

Recall that commuting observables in quantum mechanics are simultaneously observable. If I have observables A and B, and they commute, I can measure A and then B and the results will be the same as if I measured B and then A (if you insist on being precise, then by the same I mean in a statistical sense where I take averages over many identical experiments). If they don't commute, the results will not be the same: measuring A and then B will produce different results than measuring B and then A. So if I only have access to A and my friend only has access to B, by measuring A several times I can determine whether or not my friend has been measuring B or not.

Thus it is crucial that if A and B do not commute, they are not spacelike separated. Or to remove the double negatives, it is crucial that A and B must commute if they are spacelike separated. Otherwise I can tell by doing measurements of A whether or not my friend is measuring B, even though light could not have reached me from B. Then with the magic of a lorentzian spacetime I could end up traveling to my friend and arriving before he observed B and stop him from making the observation.

The correlation function you wrote down, the one without the commutator, is indeed nonzero. This represents the fact that values of the field at different points in space are correlated with one another. This is completely fine, after all there are events that are common to both in their past light cone, if you go back far enough. They have not had completely independent histories. B U T the point is that these correlations did not arise because you made measurements. You cannot access these correlations by doing local experiments at a fixed spacetime point, you can only see these correlations by measuring field values at spatial location x and then comparing notes with your friend who measured field values at spatial location y. You can only compare notes when you have had time to travel to get close to each other. The vanishing commutator guarantees that your measurements at x did not affect her measurements at y.

It is dangerous to think of fields as creating particles at spacetime locations, because you can't localize a relativistic particle in space to a greater precision than its compton wavelength. If you are thinking of fields in position space it is better to think of what you are measuring as a field and not think of particles at all.

(Actually I should say that I don't think you could actually learn that your friend was measuring B at y by only doing measurements at A. But the state of the field would change, and the evolution of the field would be acausal. I think this is a somewhat technical point, the main idea is that you don't want to be able to affect what the field is going OVER THERE outside the light cone by doing measurements RIGHT HERE because you get into trouble with causality)

If you'd like to see a small computation to show why microcausality is related to the vanishing of the commutator, here is a simple exercise that one can do.

Consider some operator $$A(\vec{x},t)$$ of which I want to measure the vacuum expectation value in some state $$\psi$$ $$\mathcal{E}_A(\vec{x},t) := \langle \psi|A(\vec{x},t)|\psi\rangle\,.$$ Now give a "kick" to the Hamiltonian at a certain time $$t_0$$ (let's assume $$t_0 = 0$$). By that I mean that we perturbe the Hamiltonian by some operator that is non zero only for $$t > 0$$. Namely $$H = H_0 + \theta(t)\, V(t)\,.$$ How does the expectation value of $$A$$ change after this perturbation? It seems that the most convenient approach would be the interaction picture, so let's do that. Without reviewing the details, we define the state $$|\psi\rangle$$ and the operators $$\mathcal{O}$$ as the time evolution operator $$\exp(i H_0 t)$$ applied on the Schrödinger picture $$\psi_{\mathrm{int}}(t) = e^{i H_0 t} \psi_{\mathrm{S}}(t)\,,\qquad H_{\mathrm{int}}(t) = e^{i H_0 t}H e^{-i H_0 t}\,.$$ The time evolution operator $$U(t,t_0)$$ must satisfy $$i \frac{\mathrm{d}}{\mathrm{d}t} \psi_{\mathrm{int}}(t) := i \frac{\mathrm{d}}{\mathrm{d}t} U(t,t_0) \psi_{\mathrm{int}}(t_0) = \theta(t) V(t)\,U(t,t_0) \psi_{\mathrm{int}}(t_0)\,,$$ where the first equality is a definition of $$U$$ and the second its differential equation. To first order in the perturbation $$V$$ the solution is $$U(t,t_0) = \mathbb{1} - i \int_{t_0}^t\mathrm{d}t' \,V(t') + O(V^2)\,,\qquad \forall\;t_0 > 0\,.$$ So far all standard. The expectation value can be then seen to transform as \begin{aligned} \mathcal{E}_A(\vec{x},t) &= \langle\psi(t)|A(\vec{x},t)|\psi(t)\rangle \\&= \langle \psi|U^\dagger(t,0) e^{i H_0 t}\, e^{-i H_0 t} A(\vec{x},0)e^{i H_0 t}\, e^{-i H_0 t} U(t,0) |\psi\rangle \\& \simeq \mathcal{E}_A(\vec{x},0) - i\int_0^t \mathrm dt'\langle \psi| A(\vec{x},t) V(t) - V(t) A(\vec{x},t) | \psi \rangle\,. \end{aligned} Here I simply used all the definitions of the interaction picture and expanded to first order in $$V$$. Now let us make a physical assumption. This is similar to what one does in linear response theory. See the Kubo formula for instance.

The perturbation $$V$$ that I defined to be a "kick" happens not only at some specific time, but also at a specific location. Therefore it will modify the Hamiltonian as the integral of some local operator $$B$$. Namely $$V(t) = \int \mathrm{d}^{d-1} x B(\vec{x},t)\,.$$ From this one has $$\mathcal{E}_A(\vec{x},t) - \mathcal{E}_A(\vec{x},0) = \int_0^t\mathrm{d}t'\int \mathrm{d}^{d-1} x'\,\langle\psi|\big[A(\vec{x},t)\,, B(\vec{x}{}',t')\big]|\psi\rangle\,.$$ Here you see immediately that microcausality must imply that the correlator has to vanish outside the light cone. Suppose that $$B$$ creates a perturbation in some location in spacetime, it is impossible that $$A$$ knows about it if they are space-like separated. You would have to wait at least the time it takes for the light to get there in order to have a change in the expectation value. Therefore the only way to preserve causality is to require $$\big[A(\vec{x},t)\,, B(\vec{x}{}',t')\big] = 0\quad \mathrm{if}\; (x-x')^2 < 0\,.$$ A simple contradiction that one might cook up is the following: tell a friend to make a perturbation to the Hamiltonian at time $$t = 0$$, or to not do it. Then you set yourself spacelike separated from your friend. If $$A$$ and $$B$$ do not commute you can infer whether your friend has decided to perturb be Hamiltonian or not by just measuring $$\mathcal{E}_A$$. And as you might know this leads to all sort of paradoxes in special relativity.