Why do coherent states have Poisson number distribution?

Short version

$\newcommand{\ket}[1]{\lvert #1 \rangle}\newcommand{\Ket}[1]{\left| #1 \right>} % $Because you can use beamsplitters to split a coherent sate into a tensorial product of many independent low photon number coherent states.

Longer version

If you send $\ket{\alpha}$ on a beamsplitter of transmission coefficient $t$ and reflection coefficient $r$ (with $|r|^2+|t|^2=1$), you obtain the product of two independent coherent states $\ket{t\alpha}\otimes\ket{r\alpha}$. This property characterizes coherent states, since any other input state leads to entanglement in the output of the beamsplitter.

Since the output state is a product state, the statistics of any measurement done at one output is independent from the ones of a measurment perdormed at the other output. Furthermore, the beamsplitter being a passive component, the total number of photon of the input state $\ket{\alpha}$ is the sum of number of photons at the outputs.

Now, you can also add beamsplitters at the outputs, and construct a tree of beam splitters,with $N\gg|\alpha|^2$ balanced outputs, transforming the input coherent state $\ket{α}$ into the product of $N$ coherent states $\Ket{\tfrac{\alpha}{\sqrt{N}}}^{\otimes N}$. As before, the total number of photon is conserved, thus the statistics of the number of photons of $\ket α$ is the sum of the $N$ independent outputs, each having a small average photon number $\tfrac{|\alpha|^2}{N}$. When $N \to \infty$, the only distribution having this property is the Poisson distribution. QED.

Link with independence of successive detection event

Note that, in the reasoning above, the beamsplitters do not need to be actual object splitting beams. Anything that changes the basis of space-time modes does the job. In particular, let your coherent state be in the mode corresponding to a pulse of light. You can also “slice” the pulse into $N$ short time slices. This description is exactly equivalent to the beamsplitter above, and corresponds to the intuition formulated by @AccidentalFourierTransform and @ThomasS above about the independence of successive photon detection events.

In all the descriptions above, I have implicitly assumed that the other port of each beamsplitter is empty, that is receives the vacuum state $\ket0$. This crucial assumption is still present above when I “slice” the coherent state in many timeslices, the initial $N-1$ vacua being in spacetime modes which are orthogonal to the original lightpulse.