Why is expectation defined by $\int xf(x)dx$?

Let $(\Omega,\mathcal{F},P)$ be a probability space and $X:\Omega\to\mathbb{R}$ a random variable, i.e. a $(\mathcal{F},\mathcal{B}(\mathbb{R}))$-measurable mapping. Then $X$ induces a probability measure on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ defined by $$ P_X(B):=P(X^{-1}(B)),\quad B\in\mathcal{B}(\mathbb{R}), $$ which is well-defined since $X$ is measurable. This is called the distribution of $X$ or the pushforward measure of $P$ under $X$. The definition of the expectation of $X$ is the following Lebesgue integral on $\Omega$: $$ {\rm E}[X]:=\int_{\Omega} X\,\mathrm dP=\int_\Omega X(\omega)\,P(\mathrm d\omega), $$ given that this integral exists. This integral can always be transformed into a Lebesgue integral on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. The following holds:

For any integrable random variable $X$ one has $$ {\rm E}[X]=\int_{\mathbb{R}} x\,P_X(\mathrm dx).\tag{1} $$

In the special case where $X$ admits a density function, i.e. $P_X(B)=P(X\in B)=\int_B f_X(x)\,\mathrm dx$ for all $B\in\mathcal{B}(\mathbb{R})$ and for some measurable, non-negative function $f_X$, we can simplify $(1)$ even further: $$ {\rm E}[X]=\int_{\mathbb{R}}xf_X(x)\,\mathrm dx. \tag{2} $$

A standard technique for showing the results in $(1)$ and $(2)$ is to a) show that it holds for indicator functions, i.e. $X=\mathbf{1}_A$ for $A\in\mathcal{F}$, b) show that if it holds for $X$ and $Y$ then it also holds for $\alpha X+Y$, $\alpha\in\mathbb{R}$, and c) if it holds for a sequence $(X_n)$ then it also holds for $\lim X_n$.


If you start with probability space $\left(\Omega,\mathcal{F},\mathbb{P}\right)$ and a measurable random variable $X:\Omega\rightarrow\mathbb{R}$ where $\mathbb{R}$ is equipped with the $\sigma$-algebra of the Borel sets $\mathcal{B}$ then $E\left(X\right)$ is denoted by $\int Xd\mathbb{P}$ or $\int X\left(\omega\right)\mathbb{P}\left(d\omega\right)$. A probability measure $P$ is induced by $X$ on $\left(\mathbb{R},\mathcal{B}\right)$ by $P\left(A\right)=\mathbb{P}\left(X^{-1}\left(A\right)\right)$. Now start with probability space $\left(\mathbb{R},\mathcal{B},P\right)$ and define $Y:\mathbb{R}\rightarrow\mathbb{R}$ by $x\mapsto x$. Then $X$ and $Y$ have the same distribution so $E\left(X\right)=E\left(Y\right)$. Applying the mentioned technique we now find $E\left(X\right)=E\left(Y\right)=\int YdP=\int Y\left(x\right)P\left(dx\right)$. Since $Y\left(x\right)=x$ this is also written as $\int xP\left(dx\right)$ or $\int xdF\left(x\right)$ where $F\left(x\right)=P\left\{ X\leq x\right\} $. If $F\left(x\right)=\int_{-\infty}^{x}f\left(x\right)dx$ for some $f$ then also $E\left(X\right)=\int xf\left(x\right)dx$.


It is not true in general, the distribution of $X$ must be absolutely continuous. I probably should elaborate a bit:

Theorem (Radon-Nikodym): Let $\mu$ be a $\sigma$-finite measure on a measurable space $(X,S)$, and $\nu$ a finite measure which absolutely continuous wrt $\mu$. Then there is a $h\in L^1(X,S,\mu)$ with $$\nu(E)=\int_E h d\mu$$ for all $E\in S$. (Any two such $h$ are equal a.e. $\mu$). $h$ is said to be the Radon-Nikodym derivative or density of $\nu$ wrt $\mu$ and denoted $d\nu/d\mu$.

This implies that if $f\in L^1(\nu)$ then $$\int f\frac{d\nu}{d\mu} d\mu=\int fd\nu,$$ which includes as a special case what you are looking for (take $\mu$ to be Lebesgue measure).