Why is the accepted definition of random variable what it is?

This is the approach behind "Free probability". Apparently it is easier to treat random matrices using this approach. Terry Tao has a nice introduction here.


On rereading, I realise I was not really addressing the question, but will let it stand anyway as part of it may be relevant.


I'm not sure what the historical reasons were, so I'll just resort to giving one good reason (which may have been related to the original reason).

If you just have a single random variable, $X$, it would be easier to just say that this is described as a distribution (ie probability measure) on the real numbers. Actually, this is what is usually done in this case.

Next, make another measurement, represented by a random variable $Y$. Now, the pair $(X,Y)$ can be represented by a distribution on $\mathbb{R}^2$. Again, no need to complicate things by saying that this is the image of the probability distribution $P$ on a probability space $\Omega$ under a map $(X,Y):\Omega\rightarrow\mathbb{R}^2$.

However, note that when going from one measurement, $X$, to two measurements, $X$ and $Y$, we had to change the probability space from $\mathbb{R}$ to $\mathbb{R}^2$.

If we introduce more measurements, we have to replace the probability space with an even bigger one. We also have to specify in the process how the probability distribution of $X$ corresponds to the marginal distribution of that for $(X,Y)$. While possible, this is cumbersome and unnatural.

If, instead, we say that we have a probability space, $\Omega$, and that $X,Y,Z,\ldots$ are different measurable functions on $\Omega$, it immediately follows that the marginal distributions of $(X,Y)$ will be the distributions of $X$ and $Y$ respectively.

Thus, instead of having a new probability space every time we have a new set of random variables, which would get tricky, we can make do with just one (big) probability space but with potentially many different functions on it defining random variables.