Self-contained formalization of random variables?

$\newcommand\Om\Omega\newcommand\ga\gamma$ I think all you need to do is clarify/formalize the terms you are using.

Given a measurable space $(X,E)$, let us say that random variables (r.v.'s) $A_1$ and $A_2$ with values in $(X,E)$ defined on probability spaces $(\Om_1,F_1,P_1)$ and $(\Om_2,F_2,P_2)$ are equivalent if they have the same distributions (that is, pushforward measures): $P_1A_1^{-1}=P_2A_2^{-1}$.

Then, for each measurable space $(X,E)$, there is a natural one-to-one correspondence between the set of all probability spaces $(X,E,\mu)$ over the given measurable space $(X,E)$ and the set of all equivalence classes of r.v.'s with values in $(X,E)$. This follows because, for any probability space $(X,E,\mu)$, the identity map of $X$ is a r.v. defined on the probability space $(X,E,\mu)$ with values in $(X,E)$, and the distribution of this identity map is $\mu$.

Now, when you say "we want to define another random variable $B$ that depends on $A$" (in a certain way), the only natural interpretation of this seems to be the following: you have/know the probabilities of the "joint events" of the form $\{A\in S,B\in T\}:=(A,B)^{-1}(S\times T)$ for some measurable spaces $(X,E)$ and $(Y,F)$, all $S\in E$, and all $T\in F$. In other words, you have/know the "joint" distribution (say $\ga$) of a random pair $(A,B)$ in some measurable space of the product form $(X\times Y,E\otimes F)$, and you want to have a probability space on which a random pair $(A,B)$ with distribution $\ga$ is to be defined.

Well, then you need to do almost nothing: as in the previous paragraph, just let $(A,B)$ be the identity map of $X\times Y$. Then $(A,B)$ will be a r.v. defined on the probability space $(X\times Y,E\otimes F,\ga)$ with values in $(X\times Y,E\otimes F)$, and the distribution of this identity map will be $\ga$. Added: In particular, each of the so-defined r.v.'s $A$ and $B$ will be defined on the probability space $(X\times Y,E\otimes F,\ga)$: $A$ is the map $X\times Y\ni(x,y)\mapsto x\in X$ and $B$ is the map $X\times Y\ni(x,y)\mapsto y\in Y$.

Similarly one can deal with any family of r.v.'s in place of a random pair $(A,B)$.


A short summary: once you have the joint distribution of all your random variables, you automatically and effortlessly have a probability space on which all your random variables can be defined. And if you do not have the joint distribution, then you cannot construct appropriate random variables.


Response to the comment by the OP:

You wrote: "What your last paragraph is saying is that, given any desired joint distribution of a conceptual set of random variables (it isn't a set since we haven't constructed them yet), there exists random variables with that joint distribution. I agree, but this is precisely what I want to avoid."

I think your language is very imprecise. First here, it does not make sense to talk about the "joint distribution of a [...] set of random variables". In particular, the phrase "the joint distribution of the set $\{A,B\}$ of r.v.'s" has no meaning. Instead, we may want to talk about the joint distribution of the random pair $(A,B)$ (which is in general different from that of $(B,A)$) or of the random pair $(A,A)$ (rather than of the set $\{A,A\}=\{A\}$). More generally, we can talk about the joint distribution of any family (not set!) of r.v.'s.

Next, the existence of a family of r.v.'s with a given joint distribution is a (very simple) fact, and you cannot possibly avoid facts, even if "this is precisely what [you] want to avoid."

You also wrote: "Can you address my suggested idea that a random variable carries a set of probability spaces instead of just one? Then we do not need to modify the probability space in random variable $A$ when constructing another random variable $B$ that depends on $A$."

I think your idea for a r.v. to "carry" a set of probability spaces instead of just one was addressed in the beginning of my answer, by suggesting to consider the equivalence classes of r.v.'s (defined on possibly different probability spaces) with the same distribution. So, as now noted in the Added sentence above, if you have a r.v. $B$ in addition to $A$, you don't need to modify anything; you can just automatically and immediately choose a certain probability space (namely, $(X\times Y,E\otimes F,\ga)$), which is one of the probability space "carried by $A$".


$\newcommand{\om}{\omega} \newcommand{\Om}{\Omega} \newcommand{\N}{\mathbb N} \newcommand{\la}{\lambda} \newcommand{\si}{\sigma} \newcommand{\R}{\mathbb R}$ The latest clarification by the OP appears useful, giving rise to the following construction.


Define the class $RV$ as follows.

Let $\Om:=\{0,1\}^\N$, let $F$ be the Borel $\si$-algebra with respect to the product topology over $\Om$, and let $P$ be the product probability measure $\la^{\otimes\N}$, where $\la$ is the uniform distribution on $\{0,1\}$. Clearly, the probability space $(\Om,F,P)$ is isomorphic to the Lebesgue probability space over the interval $[0,1]$.

Say that a subset $S$ of $\N$ is thin if the cardinality of $S\cap[n]$ is $o(n)$ as $n\to\infty$, where $[n]:=\{1,\dots,n\}$.

Let now $RV$ be the set of all (say real-valued) random variables (r.v.'s) $A$ defined on the probability space $(\Om,F,P)$ such that for some thin $S=S_A\subset\N$, some Borel function $f=f_A\colon\{0,1\}^S\to\R$, and all $\om\in\Om$ we have $$A(\om)=f(\om|_S);$$ that is, $A\in RV$ iff $A(\om)$ depends only on the values of the function $\om$ on a thin subset $S$ of $\N$.

Clearly, for any $k\in\N$, any r.v.'s $A_1,\dots,A_k$ in $RV$, and any Borel function $g\colon\R^k\to\R$, we have $g(A_1,\dots,A_k)\in RV$. This follows because the union of finitely many thin subsets of $\N$ is thin.

Moreover, for any $k\in\N$ and any probability distribution $\nu$ on $\R^k$, there are r.v.'s $A_1,\dots,A_k$ in $RV$ such that the "joint" distribution of $(A_1,\dots,A_k)$ is $\nu$. This follows because there are infinite thin subsets of $\N$.

Further, for any countable set $T$ and consistent family of finite-dimensional probability distributions on $\R^S$ indexed by finite subsets $S$ of $T$, there is a family $(A_t)_{t\in T}$ of r.v.'s in $RV$ with the given finite-dimensional distributions. This follows because there is a countable set of disjoint infinite thin subsets of $\N$.

Furthermore, for any r.v.'s $A$ and $B$ in $RV$ there is a r.v. $K\in RV$ such that $K$ is independent of $(A,B)$ and $P(K=1)=P(K=2)=P(K=3)=1/3$. Letting then $$C:=A\,1(K=1)+B\,1(K=2)+(A+B)\,1(K=3),$$ we get a r.v. $C\in RV$ such that "$C$ is $A$ with probability $1/3$, $C$ is $B$ with probability $1/3$, and $C$ is $A+B$ with probability $1/3$", as desired.


In view of the Borel isomorphism theorem, here instead of real-valued r.v.'s we may consider r.v.'s with values in arbitrary Polish spaces.


Proposition: Let $\kappa$ be some infinite number cardinal number. There exists a probability space $(\Omega,\Sigma,\nu)$ that carries $\kappa$ independent random variables with uniform distribution on $[0,1]$ and such that such for every family $\langle g_i\rangle_{i\in I}$ of real-valued random variables with $\#I\leq\kappa$ and every probability measure $\mu$ on $\mathbb{R}^I\times\mathbb{R}^J$ with $\#J\leq\omega$ and $\mathbb{R}^I$-marginal equal to the joint distribution of $\langle g_i\rangle_{i\in I}$, there exists a family of random variables $\langle g_i\rangle_{i\in J}$ such that the joint distribution of $\langle g_i\rangle_{i\in I\cup J}$ equals $\mu$. $$~$$

One can take $\Omega={0,1}^{\kappa^+}$, $\Sigma$ the product-$\sigma$-algebra, and $\nu$ the fair coin-flipping measure. The proposition can be proven using ideas from this paper.

The proposition shows that one can find a probability space that can carry a lot of nontrivial random variables and such that one can always add ex-post a countable number of random variables at a time whose distribution relates in any way to the other random variables. One never runs out of space; there is no need to enlarge the underlying probability space.

This is probably more than enough for any reasonable probabilistic argument, but works with only set-many random variables. If one wants to do this with random variables indexed by the class of ordinals, one could do this by viewing the class of all sets as a genuine set in a larger universe that contains a strongly inaccessible cardinal; this seems to be the preferred method of foundation-conscious category theorists for dealing with similar size problems.