Why is the Gaussian so pervasive in mathematics?

Quadratic (or bilinear) forms appear naturally throughout mathematics, for instance via inner product structures, or via dualisation of a linear transformation, or via Taylor expansion around the linearisation of a nonlinear operator. The Laplace-Beltrami operator and similar second-order operators can be viewed as differential quadratic forms, for instance.

A Gaussian is basically the multiplicative or exponentiated version of a quadratic form, so it is quite natural that it comes up in multiplicative contexts, especially on spaces (such as Euclidean space) in which a natural bilinear or quadratic structure is already present.

Perhaps the one minor miracle, though, is that the Fourier transform of a Gaussian is again a Gaussian, although once one realises that the Fourier kernel is also an exponentiated bilinear form, this is not so surprising. But it does amplify the previous paragraph: thanks to Fourier duality, Gaussians not only come up in the context of spatial multiplication, but also frequency multiplication (e.g. convolutions, and hence CLT, or heat kernels).

One can also take an adelic viewpoint. When studying non-archimedean fields such as the p-adics $Q_p$, compact subgroups such as $Z_p$ play a pivotal role. On the reals, it seems the natural analogue of these compact subgroups are the Gaussians (cf. Tate's thesis). One can sort of justify the existence and central role of Gaussians on the grounds that the real number system "needs" something like the compact subgroups that its non-archimedean siblings enjoy, though this doesn't fully explain why Gaussians would then be exponentiated quadratic in nature.


(The sort of obvious answer from teaching statistics several times:)

The sum of two independent normal random variables is again normal, i.e., the shape of the distribution is unchanged under addition except for stretching and scaling.

Moreover, the normal distribution is unique among distributions with finite variance in having this property.

Many phenomena in nature come from adding together various independent or almost independent terms. Therefore, we would expect the normal distribution to show up a lot in nature-inspired mathematics.


I recently came across a strange and beautiful connection between the Gaussian $e^{-x^2}$ and the method of least squares. It turns out that the square in $e^{-x^2}$ and the square in ``least squares'' is the same square.

Let $(x_i,y_i)$ (with $1\leq i \leq n$) be the data set, and assume that for each $x$, the $y$'s are normally distributed with mean $\mu(x)=\alpha x+\beta$ and variance $\sigma^2$. Then, the likelihood of generating our data (assuming that the data points are independent) is $$\prod_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{(y_i-\mu(x_i))^2}{2\sigma^2}\right) =\left(\frac{1}{\sqrt{2\pi \sigma^2}}\right)^n \exp\left( \frac{-1}{2\sigma^2} \sum_{i=1}^n (y_i - \alpha x+\beta)^2 \right)$$ We would obviously want to choose the parameters $\alpha,\beta$ so that the likelihood is maximized, and this is accomplished by minimizing $$\sum_{i=1}^n (y_i - \alpha x+\beta)^2.$$ In other words, the least squares approximation is the one that makes the data set most likely to happen.