Why do probabilists take random variables to be Borel (and not Lebesgue) measurable?

One should be careful with the definitions here. Notation: Given measurable spaces $(X, \mathcal{B}_X), (Y, \mathcal{B}_Y)$, a measurable map $f : X \to Y$ is one such that $f^{-1}(A) \in \mathcal{B}_X$ for $A \in \mathcal{B}_Y$. To be explicit, I'll say $f$ is $(\mathcal{B}_X, \mathcal{B}_Y)$-measurable.

Let $\mathcal{B}$ be the Borel $\sigma$-algebra on $\mathbb{R}$, so the Lebesgue $\sigma$-algebra $\mathcal{L}$ is its completion with respect to Lebesgue measure $m$. Then for functions $f : \mathbb{R} \to \mathbb{R}$, "Borel measurable" means $(\mathcal{B}, \mathcal{B})$-measurable. "Lebesgue measurable" means $(\mathcal{L},\mathcal{B})$ measurable; note the asymmetry! Already this notion has some defects; for instance, if $f,g$ are Lebesgue measurable, $f \circ g$ need not be, even if $g$ is continuous. (See Exercise 2.9 in Folland's Real Analysis.)

$(\mathcal{L}, \mathcal{L})$-measurable functions are not so useful; for instance, a continuous function need not be $(\mathcal{L}, \mathcal{L})$-measurable. (The $g$ from the aforementioned exercise is an example.) $(\mathcal{B}, \mathcal{L})$ is even worse.

Given a probability space $(\Omega, \mathcal{F},P)$, our random variables are $(\mathcal{F}, \mathcal{B})$-measurable functions $X : \Omega \to \mathbb{R}$. The Lebesgue $\sigma$-algebra $\mathcal{L}$ does not appear. As mentioned, it would not be useful to consider $(\mathcal{F}, \mathcal{L})$-measurable functions; there simply may not be enough good ones, and they may not be preserved by composition with continuous functions. Anyway, the right analogue of "Lebesgue measurable" would be to use the completion of $\mathcal{F}$ with respect to $P$, and this is commonly done. Indeed, many theorems assume a priori that $\mathcal{F}$ is complete.

Note that, for similar reasons as above, we should expect $f(X)$ to be another random variable when $f$ is Borel measurable, but not when $f$ is Lebesgue measurable. Using $(\mathcal{F}, \mathcal{L})$ in our definition of "random variable" would not avoid this, either.

The moral is this: To get as many $(\mathcal{B}_X, \mathcal{B}_Y)$-measurable functions $f : X \to Y$ as possible, one wants $\mathcal{B}_X$ to be as large as possible, so it makes sense to use a complete $\sigma$-algebra there. (You already know some of the nice properties of this, e.g. an a.e. limit of measurable functions is measurable.) But one wants $\mathcal{B}_Y$ to be as small as possible. When $Y$ is a topological space, we usually want to be able to compose $f$ with continuous functions $g : Y \to Y$, so $\mathcal{B}_Y$ had better contain the open sets (and hence the Borel $\sigma$-algebra), but we should stop there.


One reason is that probabilists often consider more than one measure on the same space, and then a negligible set for one measure (added in a completion) might be not negligible for the other. The situation becomes more acute when you consider uncountably many different measures (such as the distributions of a Markov process with different starting points.)

Another reason is that probabilists often need to consider projections of events: Instead of asking if Brownian motion (say) has some property at time $t$, we would like to know if there exists a time where Brownian motion has that property. Projections of Borel sets in a Polish space are Analytic (also known as Suslin) sets, and these sets are universally measurable (i.e., measurable in the completion of any Borel measure); a good source for this is [1]. In contrast, projections of Lebesgue measurable sets might fail to be Lebesgue measurable which then hinders further analysis.

[1] Arveson, William. An invitation to C*-algebras. Vol. 39. Springer Science & Business Media, 2012.


Some reasons can be found here. Borel measurable functions are much nicer to deal with. Every continuous function is Borel measurable, but the inverse of a Lebesgue measurable set may not be Lebesgue measurable. Moreover, Borel measurable functions are very well behaved when it comes to conditioning. If $f:(X,\Sigma)\to\mathbb{R}$ is Borel measurable, then a function $g:X\to\mathbb{R}$ is measurable with respect to $(X,\sigma(f))$ if and only if there exists a Borel measurable function $h:\mathbb{R}\to\mathbb{R}$ such that $g=h\circ f$.

On a more conceptual note, the less measurable sets you have in your codomain, the easier it is for a function to be measurable. And if a random variable should represent a random quantity, then all empirically interesting questions can be formulated in terms of simple intervals and their combinations. For, say, statistical applications there is no empirical difference between Borel sets and a Borel set modified by a null set. The distributions (on the reals) commonly applied can usually be given by a cumulative distribution function and such a function essentially determines the probability of intervals.