Why do statisticians like "$n-1$" instead of "$n$"?

(Too long for a comment:)

I can offer an explanation showing that dividing by $n$ would give an underestimation of the variance. The sum of squares $\sum (X_i - \overline{X})^2$, where $\overline{X}$ is the sample mean, is smaller than the sum $\sum (X_i - \mu)^2$ where $\mu$ is the true mean. This is the case since $\overline{X}$ is expected to be ''closer'' to the data points than the true mean since $\overline{X}$ is calculated based on the data. In fact, $\overline{X}$ is the value of $t$ such that the sum $\sum (X_i - t)^2$ is minimized. This shows that we underestimate the variance, so we should divide by something smaller than $n$. To put it even less formal, you try to determine how much your data is spread by comparing the deviations to the sample mean, which is always an underestimation. The sample mean is as close to the data as possible, whereas the true mean will differ more.

The reason that we divide by precisely $n-1$ is that the estimator becomes unbiased (as pointed out in the comments).


Any weighted average of the $(X_i - \mu)^2$'s is an unbiased estimator. This is why you "should use $\mu$ instead of $\bar{x}$ and divide by $n$, if the true mean is known".

Unfortunately, $\mu$ is usually unknown.

Of all the procedures that try to correct this problem by replacing $\mu$ by a function of the $X_i$, the one that takes the replacement of $\mu$ to be the average of the $X_i$ (with the same weights) minimizes the estimator, and therefore lowers its expected value below that of the unbiased estimator using $\mu$.

It is not surprising at all that modifying an unbiased procedure makes it biased, and one can wrap a story around it by saying that the bias comes from error in estimating $\mu$. Without making that story more specific, such as using the variance decomposition, that narrative is technically correct, but is only a restatement of the fact that there was a change to an unbiased procedure (by estimating $\mu$).

The miracle is that the correction factor to compensate for the bias is independent of the distribution of the (i.i.d) $X_i$. This is a unique property of variance and least-squares estimation.


If you know the entire population you're talking about, the formula with $n$ in the denominator will give you its true variance.

However, if you don't know the entire population, but just have a limited amount of random samples from it, it is unlikely that your samples will have the full variation of the underlying population (that is, your random sample probably won't include the largest or smallest values in the population). So the variance of the random sample will likely be smaller than the true variance of the underlying population.

If you want to compute an estimate of the variance of the underlying population, you use the formula with $n-1$ in the denominator. This gives a value slightly larger than the variance of the sample, which is in a certain technical sense the "best possible" guess about the unknown underlying population, under appropriate assumptions. Roughly you can think of this as, the $(n-1)$-estimate of the variance you get from a random sample is about as likely to be too high as to be too low.

When the sample size $n$ is large, dividing by $n$ or $n-1$ does not make a large difference, but for small differences it can be significant.

As an extreme case, if you sample just one case from the population, that won't tell you anything about how much the values in that population differ from each other -- this shows up in the variance estimate formula as a division-by-zero.