Is the Law of Large Numbers empirically proven?

Reading between the lines, it sounds like you are committing the fallacy of the layman interpretation of the "law of averages": that if a coin comes up heads 10 times in a row, then it needs to come up tails more often from then on, in order to balance out that initial asymmetry.

The real point is that no divine presence needs to take corrective action in order for the average to stabilize. The simple reason is attenuation: once you've tossed the coin another 1000 times, the effect of those initial 10 heads has been diluted to mean almost nothing. What used to look like 100% heads is now a small blip only strong enough to move the needle from 50% to 51%.

Now combine this observation with the easily verified fact that 9900 out of 10000 heads is simply a less common combination than 5000 out of 10000. The reason for that is combinatorial: there is simply less freedom in hitting an extreme target than a moderate one.

To take a tractable example, suppose I ask you to flip a coin 4 times and get 4 heads. If you've flip tails even once, you've failed. But if instead I ask you to aim for 2 heads, you still have options (albeit slimmer) no matter how the first two flips turn out. Numerically we can see that 2 out of 4 can be achieved in 6 ways: HHTT, HTHT, HTTH, THHT, THTH, TTHH. But the 4 out of 4 goal can be achieved in only one way: HHHH. If you work out the numbers for 9900 out of 10000 versus 5000 out of 10000 (or any specific number in that neighbourhood), that disparity becomes truly immense.

To summarize: it takes no conscious effort to get an empirical average to tend towards its expected value. In fact it would be fair to think in the exact opposite terms: the effect that requires conscious effort is forcing the empirical average to stray from its expectation.


Nice question! In the real word, we don't get to let $n \to \infty$, so the question of why LLN should be of any comfort is important.

The short answer to your question is that we cannot empirically verify LLN since we can never perform an infinite number of experiments. Its a theoretical idea that is very well founded, but, like all applied mathematics, the question of whether or not a particular model or theory holds is a perennial concern.

A more useful law from a statistical standpoint is the Central Limit Theorem and the various probability inequalities (Chebyshev, Markov, Chernov, etc). These allow us to place bounds on or approximate the probability of our sample average being far from the true value for a finite sample.

As for an actual experiment to test LLN. One can hardly do better than John Kerrichs 10,000 coin flip experiment-- he got 50.67% heads!!

So, in general, I would say LLN is empirically well supported by the fact that scientists from all fields rely upon sample averages to estimate models, and this approach has been largely successful, so the sample averages appear to be converging nicely for finite, and feasible, sample sizes.

There are "pathological" cases that one can construct (I'll spare you the details) where one needs astronomical sample sizes to get a reasonable probability of being close to the true mean. This is apparent if you are using the Central Limit Theorem, but the LLN is simply not informative enough to give me much comfort in day-to-day practice.

The physical basis for probability

It seems you still an issue with why long-run averages exist in the real world, apart from the theory of probability regarding the behavior of these averages assuming long-run averages exist. Let me state a fact that may help you:

Fact Nether probability theory nor the existence of a long-run averages requires randomness !

The determinism vs. indeterminism debate is for philosophers, not mathematics. The notion of probability as a physical observable comes from ignorance or absence of the detailed dynamics of what you are observing. You could just as easily apply probability theory to a boring 'ol pendulum as to the stock market or coin flips...its just that with pendulum's we have a nice, detailed theory that that allows us make precise estimates of future observations. I have no doubt that a full physical analysis of a coin flip would allow for us to predict what face would come up...but in reality, we will never know this!

This isn't an issue though. We don't need to assume a guiding hand nor true indeterminism to apply probability theory. Lets say that coin flips are truly deterministic, then we can still apply probability theory meaningfully if we assume a couple basic things:

  1. The underlying process is $ergodic$...okay, this is a bit technical, but it basically means that the process dynamics are stable over the long term (e.g., we are not flipping coins in a hurricane or where tornados pop in and out of the vicinity!). Note that I said nothing about randomness...this could be a totally deterministic, albeit very complex, process...all we need is that the dynamics are stable (i.e., we could write down a series of equations with specific parameters for the coin flips and they wouldn't change from flip to flip).
  2. The values the process can take on at any time are "well behaved". Basically, like I said earlier wrt the Cauchy...the system should not produce values that consistently exceed $\approx n$ times the sum of all previous observations. It may happen once in a while, but it should become very rare, very fast (precise definition is somewhat technical).

With these two assumptions, we now have the physical basis for the existence of a long-run average of a physical process. Now, if its complicated, then instead of using physics to model it exactly, we can apply probability theory to describe the statistical properties of this process (i.e., aggregated over many observations).

Note that the above is independent from whether or not we have selected the correct probability model. Models are made to match reality...reality does not conform itself to our models. Therefore, it is the job of the modeler, not nature or divine provenance, to ensure that the results of the model match the observed outcomes.

Hope this helps clarify when and how probability applies to the real world.


This isn't an answer, but I thought this group would appreciate it. Just to show that the behavior in the graph above is not universal, I plotted the sequence of sample averages for a Standard Cauchy distribution for $n=1...10^6$!. Note how, even at extremely large sample sizes, the sample average jumps around.

If my computer weren't so darn slow, I could increase this by another order of magnitude and you'd not see any difference. The sample average for a Cauchy Distribution behaves nothing like that for coin flips, so one needs to be careful about invoking LLN. The expected value of your underlying process needs to exist first!

enter image description here

Response to OP concerns

I did not bring this example up to further concern you, but merely to point out that "averaging" does not always reduce the variability of an estimate. The vast majority of the time, we are dealing with phenomena that possess an expected value (e.g., coin tosses of a fair coin). However, the Cauchy is pathological in this regard, since it does not possess an expected value...so there is no number for your sample averages to converge to.

Now, many moons ago when I first encountered this fact, it blew my mind...and shook my confidence in statistics for a short time! However, I've come to be comfortable with this fact. At the intuitive level (and as many of the posters here have pointed out) what the LLN relies upon is the fact that no single outcome can consistently dominate the sample average...sure, in the first few tosses the outcomes do have a large influence, but after you've accumulated $10^6$ tosses, you would not expect the next toss to change your sample average from, say, 0.1 to 0.9, right? It's just not mathematically possible.

Now enter the Cauchy distribution...it has the peculiar property that, no matter how many values you are currently averaging over, the absolute value of the next observation has a good (i.e., not vanishingly small - this part is somewhat technical, so maybe just accept this point) chance of being larger (much larger, in fact) than n times the sum of all previous values observed...take a moment to think about this, this means that at any moment, your sample average can be converging to some number, then WHAM, it gets shot off in a different direction. This will happen infinitely often, so you're sample average will never settle down like it does with processes that possess an expected value (e.g., coin tosses, normally distributed variables, poisson, etc.). Thus, you will never have an observed sum and an $n$ large enough to swamp the next observation.

I've asked @sonystarmap if he/she would mind calculating the sequence of medians, as opposed to the sequence of averages in their post (similar to my post above, but for 100x more samples!) What you should see is that the median of a sequence of Caychy random variables does converge in LLN fashion. This is because the Cauchy, like all random variables, does possess a median. This is one of the many reasons I like using medians in my work, where Normality is almost surely (sorry, couldn't help myself) false and there are extreme fluctuations. Not to mention the sample median minimizes the average deviation from the mean, when it does exist.

Second Addition: Cauchy DOES have a Median

To add another detail (read:wrinkle) to this story, the Cauchy does have a median, and so the sequence of medians does converge to the true median (i.e., $0$ for the standard Cauchy.) To show this, I took the exact same sequence of standard cauchy variates I used to make my first graph of the sample averates, and then took the first 20,000 and broke it up into four intervals of 5000 observations each (youll see why in a moment). I then plotted the sequence of sample medians as the samep size approaches 5000 for each of the four independent sequence. Note the dramatic difference in convergence properties!

This is another application of the law of large numbers, but to the sample median. Details can be seen here.

enter image description here