What's so special about standard deviation?

There's a very nice geometric interpretation.

Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "is a linear transformation of", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean, finite-variance variables; it gets you the same outcome in this context.)

Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,\,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.


I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $\mu$ and standard deviation $\sigma$, then for large $n$

$$\frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$$

is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.


An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.

(This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)

Tags:

Statistics