What is the use of moments in statistics?

The central question in statistics is that given a set of data, we would like to recover the random process that produced the data (that is, the probability law of the population). This question is extremely difficult in general and in the absence of strong assumptions on the underlying random process you really can't get very far (those who work in nonparametric statistics may disagree with me on this). A natural way to approach this problem would be to look for simple objects that do identify the population distribution if we do make some reasonable assumptions.

The question then becomes what type of objects should we search for. The best arguments I know about why we should look at the Laplace (or Fourier; I'll show you what this is in a second if you don't know) transform of the probability measure are a bit complicated, but naively we can draw a good heuristic from elementary calculus: given all the derivatives of an analytic function evaluated at zero we know everything there is to know about the function through its Taylor series.

Suppose for a moment that the function $f(t) = E[e^{tX}]$ exists and is well behaved in a neighborhood of zero. It is a theorem that this function (when it exists and behaves nicely) uniquely identifies the probability law of the random variable $X$. If we do a Taylor expansion of what is inside the expectation, this becomes a power series in the moments of $X$: $f(t) = \sum_{k=0}^\infty \frac{1}{k!} t^k E[X^k]$ and so to completely identify the law of $X$ we just need to know the population moments. In effect we reduce the question above "identify the population law of $X$" to the question "identify the population moments of $X$".

It turns out that (from other statistics) population moments are extremely well estimated by sample moments when they exist, and you can even get a good feel on how far off from the true moments it is possible to be under some often realistic assumptions. Of course, we can never get infinitely many moments with any degree of accuracy from a sample, so now we would really want to do another round of approximations, but that is the general idea. For "nice" random variables, moments are sufficient to estimate the sample law.

I should mention that what I have said above is all heuristic and doesn't work in most interesting modern examples. In truth, I think the right answer to your question is that we don't need moments because for many relevant applications (particularly in economics) it seems unlikely that all moments even exist. The thing is that when you get rid of moment assumptions you lose an enormous amount of information and power: without at least two, the Central Limit Theorem fails and with it go most of the elementary statistical tests. If you do not want to work with moments, there is a whole theory of nonparametric statistics that make no assumptions at all on the random process.


Moments are the constants of a population, as the mean, variance, etc., are. These constants help in deciding the characteristics of the population and on the basis of these characteristics a population is discussed.

Moments help in finding AM, standard deviation and variance of the population directly, and they help in knowing the graphic shapes of the population.

We can call moments as the constants used in finding the graphic shape, as the graphic shape of the population also help a lot in characterizing a population.


A possible intuition is as follows. Suppose that we have a random variable $X$ and let us consider the probability $P(|X|>x)$ with some $x>0$. We know that $P(|X|>x)\to0$ as $x\to\infty$ (we can prove this using the properties of the cumulative distribution function). But we might be interested in the rate of convergence. For example, can we say that $$x^pP(|X|>x)\to0$$ as $x\to\infty$ with some $p>0$? If $\operatorname E|X|^p<\infty$, then this is actually the case. Also, if $x^pP(|X|>x)\to0$, then $\operatorname E|X|^q<\infty$ for each $q<p$. So moments establish a certain bound on the rate of convergence of $P(|X|>x)$ and this rate of convergence is important in many situations (for instance, the law of large numbers and the central limit theorem). Loosely speaking, the more moments we have, the less probable large values of the random variable are.