Intuitive explanation of Poisson distribution

Since you asked for an intuition, and there are many online derivations of the pdf of the Poisson distribution (e.g. here or here), which already follow a mathematically strict sequence, I'm giving it a shot at looking at it almost as a mnemonic construction.

So the pdf is

$$f_X(x=k)=\frac{\lambda^k\mathrm e^{-\lambda}}{k!}$$

What about thinking of the Poisson parameter $\lambda$ as somewhat reflecting the odds of an event happening in any time period. After all, it is a rate (events/time period), and hence, the higher the rate, the more likely it will be that a certain number of events takes place in a given time period. Further, you already mention how the pdf of the Poisson is derived from the binomial, allowing $n$ to go to infinity; and in the binomial distribution, the expectation is $np,$ equal to $\lambda$ in the Poisson: $p=\frac{\lambda}{n}.$

Notice, for instance, that in the derivation of the pdf of the Poisson, $\left(\frac{\lambda}{n}\right)^k$ is precisely introduced as the $p^k$ (the probability of $k$ successes) in the binomial pmf, $\binom{n}{k}p^k(1-p)^{n-k}.$ The denominator $n^k$ is later eliminated as we calculate the limit $n\to\infty,$ and indeed, $\lambda^k$ is "left over" from this initial probability formula.

Now, in the pdf you have the term raised to the $k$ power, i.e. $\lambda^k$, and it makes intuitive sense, because each occurrence is independent from the preceding and subsequent. So if we are calculating the probability of $k$ iid events happening in a time period, we shouldn't be surprised to end up with $\underbrace{\lambda\cdot\lambda \cdots\lambda}_k=\lambda^k$.

Since these events are indistinguishable from each other, it is not surprising either that we have to prevent over-counting by dividing by the number of permutations of these events, $k!.$ This, in fact is the exact role of the term in the combinations formula of $\binom{n}{k}=\frac{n!}{(n-k)!\color{blue}{k!}}.$

And for the term $e^{-\lambda}$ we could bring into play the inter-arrival time following an exponential distribution: as the rate $\lambda$ increases, the inter-arrival time decreases. We can think of this factor as decreasing the probability of a low $k$ number of events when the rate $\lambda$ factor is high.


Suppose $k$ successes occur in an interval $[0, t)$ and let their times be given by the $k$-tuple $(t_1, \dots, t_k), t_i \leq t$.

The set of events where exactly $n$ successes occur can be measured as $$ \int_0^{t} \int_0^{t - x_1} \cdots \int_0^{ t - \sum_{i = 1}^{n-1} x_i } \int_0^{ t - \sum_{i = 1}^{n} x_i } dx_n dx_{n-1} dx_{n-2} \dots dx_2 dx_1 = \frac{ t^n } { n! }$$

Importantly, the size of the sample space of all events is measured by considering the size of all possible $k$-tuples, $\forall n \geq 0$:

$$ \sum_{k = 0}^{\infty} \frac{ t^k }{ k! } = e^t$$

Taking the ratio of these size of these sets yields the probability that $n$ events occur in the interval $[0, t)$.

$$\boxed{ P \{ X = n \} = e^{-t} \frac{ t^n }{ n! } }$$

Note:

More generally, the event rate can be made non-homogeneous with a scalar function $\lambda(t)$. When the rate is constant for all time, i.e, $\lambda(t) = \lambda$, we write

$$P(X = n) = e^{-\lambda t}\frac{ (\lambda t)^n } { n! }$$

Letting $t = 1$ gives the process on a unit time interval, scaled by $\lambda$. Although we're really interested in $[0, 1)$, it's really as if we're looking at the interval $[0, \lambda)$.