Intuition behind using complementary CDF to compute expectation for nonnegative random variables

For the discrete case, and if $X$ is nonnegative, $E[X] = \sum_{x=0}^\infty x P(X = x)$. That means we're adding up $P(X = 0)$ zero times, $P(X = 1)$ once, $P(X = 2)$ twice, etc. This can be represented in array form, where we're adding column-by-column:

$$\begin{matrix} P(X=1) & P(X = 2) & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & P(X = 2) & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & & & P(X = 4) & P(X = 5) & \cdots \\ & & & & P(X = 5) & \cdots\end{matrix}.$$

We could also add up these numbers row-by-row, though, and get the same result. The first row has everything but $P(X = 0)$ and so sums to $P(X > 0)$. The second row has everything but $P(X =0)$ and $P(X = 1)$ and so sums to $P(X > 1)$. In general, the sum of row $x+1$ is $P(X > x)$, and so adding the numbers row-by-row gives us $\sum_{x = 0}^{\infty} P(X > x)$, which thus must also be equal to $\sum_{x=0}^\infty x P(X = x) = E[X].$

The continuous case is analogous.

In general, switching the order of summation (as in the proof the OP links to) can always be interpreted as adding row-by-row vs. column-by-column.


A hint and a proof.

Hint: if $X=x$ with full probability, the integral is the integral of $1$ on $(0,x)$, hence the LHS and the RHS are both $x$.

Proof: apply (Tonelli-)Fubini to the function $(\omega,x)\mapsto\mathbf 1_{X(\omega)>x}$ and to the sigma-finite measure $P\otimes\mathrm{Leb}$ on $\Omega\times\mathbb R_+$. One gets $$ \int_\Omega\int_{\mathbb R_+}\mathbf 1_{X(\omega)>x}\mathrm dx\mathrm dP(\omega)=\int_\Omega\int_0^{X(\omega)}\mathrm dx\mathrm dP(\omega)=\int_\Omega X(\omega)\mathrm dP(\omega)=E(X), $$ while, using the shorthand $A_x=\{\omega\in\Omega\mid X(\omega)>x\}$, $$ \int_{\mathbb R_+}\int_\Omega\mathbf 1_{X(\omega)>x}\mathrm dP(\omega)\mathrm dx=\int_{\mathbb R_+}\int_\Omega\mathbf 1_{\omega\in A_x}\mathrm dP(\omega)\mathrm dx=\int_{\mathbb R_+}P(A_x)\mathrm dx=\int_{\mathbb R_+}P(X>x)\mathrm dx. $$


Since the intuition behind the result is requested, let us consider a simple case of a discrete non-negative random variable taking on the three values $x_0 = 0$, $x_1$, and $x_2$ with probabilities $p_0$, $p_1$, and $p_2$. The cumulative distribution function (CDF) $F(x)$ is thus a staircase function $$F(x) = \begin{cases} 0, & x < 0, \\ p_0, & 0 \leq x < x_1,\\ p_0 + p_1, & x_1 \leq x < x_2,\\ 1, & x \geq x_2, \end{cases}$$ with jumps of $p_0$, $p_1$, and $p_2$ at $0$, $x_1$, and $x_2$ respectively. Note also that $$ E[X]= \sum_{i=1}^3 p_ix_i = p_1x_1 + p_2x_2. $$ Now, notice that $$\int_0^\infty P\{X > x\}\mathrm dx = \int_0^\infty [1 - F(x)]\mathrm dx$$ is the area of the region bounded by the curve $F(x)$, the vertical axis, and the line at height 1 above the horizontal axis. Standard Riemann integration techniques say that we should divide the region into narrow vertical strips, compute the area of each, take the sum, take limits etc. In our example, of course, all this can be bypassed since the region in question is the union of two adjoining non-overlapping rectangles: one of base $x_1$ and height $(1-p_0)$, and the other of base $x_2 - x_1$, and height $(1-p_0-p_1)$. BUT, suppose we divide the region under consideration into two different adjoining non-overlapping rectangles with the second lying above the first. The first rectangle has base $x_1$ and height $p_1$, while the second (lying above the first) has broader base $x_2$ and height $p_2$. The total area that we seek is easily seen to be $p_1x_1 + p_2x_2 = E[X]$.

Thus, for a non-negative random variable, $E[X]$ can be interpreted as the area of the region lying above its CDF $F(x)$ and below the line at height 1 to the right of the origin. The standard formula $$E[X] = \int_0^\infty x\mathrm dF(x)$$ can be thought of as computing this area by dividing it into thin horizontal strips of length $x$ and height $dF(x)$, while $$\int_0^\infty P\{X > x\}\mathrm dx = \int_0^\infty [1 - F(x)]\mathrm dx$$ (in the Riemann integral sense) can be thought of as computing the area by dividing it into thin vertical strips.

More generally, if $X$ takes on both positive and negative values, $$E[X] = \int_0^\infty [1 - F(x)]\mathrm dx - \int_{-\infty}^0 F(x) \mathrm dx$$ with similar interpretations.