Multiple measurements of the same quantity - combining uncertainties

When you're combining measurements with different uncertainties, taking the mean is not the right thing to do. (Well, it's good enough if the uncertainties are almost the same.)

The right thing to do is chi-squared analysis, which gives a higher weight to the more accurate measurements. Here's how you do it:

$$\chi^2 = \sum \frac{(\text{observed value} - \text{true value})^2}{\text{(uncertainty associated with that observation)}^2}$$

You numerically choose the "true value" that minimizes $\chi^2$. That's your best guess.

Next, use the chi-square distribution to calculate the p-value (assuming the best guess is right). (Degrees of freedom is one less than the number of observations.) This will tell you whether your uncertainties were reasonable or whether you underestimated them. For example, if one measurement is $5.0 \pm 0.1$, and another measurement is $10.0 \pm 0.1$, then you probably underestimated your uncertainties.

IF you underestimated your uncertainties -- which is not unusual in practice -- then the right thing to do is to figure out where you went wrong in your uncertainty estimation, and correct the mistake. But there is a lazier alternative too, which is often good enough if the stakes are low: You can scale up all the uncertainties by the same factor until you get a reasonable $\chi^2$ p-value, say 0.5.

OK, now you have plausible measurement uncertainties, either because you did from the beginning or because you scaled them up. Next, you try varying the "true value" until the p-value dips down below, say, 5%. This procedure gives you lower bound and upper bound error bars on your final best-guess measurement.

I haven't done this in many years, sorry for any mis-remembering. I think it's discussed in Bevington&Robinson.


You seem to be mixing up several concepts here.

In particular you are interested in the error on a mean, refer to the error on a sum (which you need on the way to the error on a mean), and talk about the standard deviation.

(all work here in the naive version assuming zero correlation.)

  • Error of a sum of uncertain quantities: $X = \sum_i x_i$ and $\Delta X = \sqrt{ \sum_i \left(\Delta x_i\right)^2 }$

  • Error of the mean of uncertain quantities: Divide the sum by the number of measurements. The number of measurements is definite, you are not at all unsure how many figures you have processed so this is just division. $$\bar{x} = \frac{X}{N} = \frac{\sum_i x_i}{N} \quad ,$$ and $$\Delta \bar{x} = \frac{\Delta X}{N} = \frac{\sqrt{\sum_i \left( \Delta x_i\right)^2}}{N} \quad.$$ (Note that later on you will encounter the phrase "error of the mean" in the context of large samples. That's different.)

    If you individual $\Delta x_i$s vary considerably it is better to use the error weighted mean.

  • Standard Deviation: This is a figure expressing the dispersion of you $x_i$s, and is figured without reference to your $\Delta x_i$s. Generally represented with $\sigma$ and we call $\sigma^2$ the "variance". $$ \sigma^2 = \frac{1}{N} \sum_i \left( x_i - \bar{x} \right)^2 \quad ,$$ with a minor correction that if you have to get $\bar{x}$ from the same list (which you do) you use $$ \sigma^2 = \frac{1}{N-1} \sum_i \left( x_i - \bar{x} \right)^2 \quad .$$

If you have estimated your $\Delta x_i$s correctly then there should be a relationship between the standard deviation and the error on the mean, but that is for another day.


You complain in your question that the error of the mean does not go to the standard deviation in the limit that $\Delta x_i = 0$, but that is because they represent different concepts. It is possible to have experiment where the individual measurements are drawn from a broad distribution but are known very well (high standard deviation, but low individual uncertanties) or where you re-measure the same underling quantity with poor instruments (zero standard deviation, but large $\Delta x_i$s). In a lot of ways the cases can be treated with the same math, but they are different.