Are random errors necessarily Gaussian?

Are random errors necessarily gaussian?

Errors are very often Gaussian, but not always. Here are some physical systems where random fluctuations (or "errors" if you're in a context with the thing that's varying constitutes an error) are not Gaussian:

  1. The distribution of times between clicks in a photodetector exposed to light is an exponential distribution.$^{[a]}$

  2. The number of times a photodetector clicks in a fixed period of time is a Poisson distribution.

  3. The position offset, due to uniformly distributed angle errors, of a light beam hitting a target some distance away is a Cauchy distribution.

I have seen random errors being defined as those which average to 0 as the number of measurements goes to infinity, and that the error is equally likely to be positive or negative. This only requires a symmetric probability distribution about zero.

There are distributions that have equal weight on the positive and negative side, but are not symmetric. Example: $$ P(x) = \left\{ \begin{array}{ll} 1/2 & x=1 \\ 1/4 & x=-1 \\ 1/4 & x=-2 \, . \end{array}\right.$$

However typing this question into Google, I did not find a single source that suggested random errors could be anything other than gaussian. Why must random errors be gaussian?

The fact that it's not easy to find references to non-Gaussian random errors does not mean that all random errors are Gaussian :-)

As mentioned in the other answers, many distributions in Nature are Gaussian because of the central limit theorem. The central limit theorem says that given a random variable $x$ distributed according to a function $X(x)$, if $X(x)$ has finite second moment, then given another random variable $y$ defined as the average of many instances of $x$, i.e. $$y \equiv \frac{1}{N} \sum_{i=1}^N x_i \, ,$$ the distribution $Y(y)$ is Gaussian.

The thing is, many physical processes are the sums of smaller processes. For example, the fluctuating voltage across a resistor is the sum of the voltage contributions from many individual electrons. Therefore, when you measure a voltage, you get the underlying "static" value, plus some random error produced by the noisy electrons, which because of the central limit theorem is Gaussian distributed. In other words, Gaussian distributions are very common because so many of the random things in Nature come from a sum of many small contributions.

However,

  1. There are plenty of cases where the constituents of an underlying error mechanism have a distribution that does not have a finite second moment; the Cauchy distribution is the most common example.

  2. There are also plenty of cases where an error is simply not the sum of many small underlying contributions.

Either of these cases lead to non-Gaussian errors.

$[a]$: See this other Stack Exchange post.


The reason is probably the central limit theorem: When you add lots of independent random variables, their sum will form a normal distribution, irrespective of their individual probability distributions. This makes normal distributions a pretty good guess if you do not have information about the origin of the error or if you have multiple sources of error. Additionally, normal distributions often occur in real-world processes.


Answers here have generally addressed the different question of whether empirical variables should be Gaussian, but 21joanna12 asked about experimental errors, which admit a completely different analysis. The best resource on that question I can recommend is Chapter 7 of Probability Theory: The Logic of Science by E T Jaynes. In short, there are good reasons errors are Gaussian (albeit not literally always):

  • Sec. 7.2 considers the Herschel-Maxwell derivation, which shows that a vector-valued error of dimension $\ge 2$ with uncorrelated errors in orthogonal Cartesian components, and a spherically symmetric distribution, must have a Gaussian modulus. (Well, actually the book only checks the $2$-dimensional case explicitly, but the argument is easily extended.)
  • Sec. 7.3 considers the Gauss derivation, which shows a Gaussian distribution is the only way for the MLE of a location parameter to be equal to the arithmetic mean of the data. The notation assumes $1$-dimensional data, but I think the argument generalises provided the error's Cartesian coordinates are uncorrelated.
  • Sec. 7.5 considers the Landau derivation, which presents a Taylor-series argument that a 1D error $e$ of finite variance and zero mean has a pdf, say $p$, satisfying the diffusion equation $\partial_{\sigma^2}p=\frac{1}{2}\partial_e^2 p$ with $\sigma^2$ a variance parameter. The requirement that $\sigma^2=0\implies p(e)=\delta(e)$ then implies the solution is Gaussian.
  • Sec. 7.9 shows that without prior information, a 1D error's distribution has the following property iff it's Gaussian: the unique choice of $w_i\ge 0$ with $\sum_i w_i=1$ that minimises the variance of an estimator $\sum_i w_i x_i$ of the sample mean, with the $x_i$ our $n$ empirical data, is $w_i=n^{-1}$.
  • A related point discussed in Sec. 7.11 is that an error of given finite mean and variance maximises its entropy subject to that information iff its distribution is Gaussian. Jaynes argues that any non-entropy-maximising model exaggerates how much we can infer from our limited knowledge.

However, the short Sec. 7.12 (which I reproduce in full) gives examples where we don't expect Gaussian errors:

Once we understand the reasons for the success of Gaussian inference, we can also see very rare special circumstances where a different sampling distribution would better express our state of knowledge. For example, if we know that the errors are being generated by the unavoidable and uncontrollable rotation of some small object, in such a way that when it is at angle $\theta$, the error is $e=\alpha\cos\theta$ but the actual angle is unknown, a little analysis shows that the prior probability assignment $p(e|t)=(\pi\sqrt{\alpha^2-e^2})^{-1},\,e^2<\alpha^2$, correctly describes our state of knowledge about the error. Therefore it should be used instead of the Gaussian distribution; since it has a sharp upper bound, it may yield appreciably better estimates than would the Gaussian – even if $\alpha$ is unknown and must therefore be estimated from the data (or perhaps it is the parameter of interest to be estimated).

Or, if the error is known to have the form $e = \alpha\tan\theta$ but $\theta$ is unknown, we find that the prior probability is the Cauchy distribution $p(e|I) = \pi^{−1}\alpha/(\alpha^2 + e^2)$. Although this case is rare, we shall find it an instructive exercise to analyze inference with a Cauchy sampling distribution, because qualitatively different things can happen. Orthodoxy regards this as ‘a pathological, exceptional case’ as one referee put it, but it causes no difficulty in Bayesian analysis, which enables us to understand it.

Note these examples use the same Bayesian techniques as Sec. 7.11.