Precise definition of the support of a random variable

I am not entirely convinced with the line the sample space is also called the support of a random variable

That looks quite wrong to me.

What is even more confusing is, when we talk about support, do we mean that of $X$ or that of the distribution function $Pr$?

In rather informal terms, the "support" of a random variable $X$ is defined as the support (in the function sense) of the density function $f_X(x)$.

I say, in rather informal terms, because the density function is a quite intuitive and practical concept for dealing with probabilities, but no so much when speaking of probability in general and formal terms. For one thing, it's not a proper function for "discrete distributions" (again, a practical but loose concept).

In more formal/strict terms, the comment of Stefan fits the bill.

Do we interpret the support to be

- the set of outcomes in Ω which have a non-zero probability,
- the set of values that X can take with non-zero probability?

Neither, actually. Consider a random variable that has a uniform density in $[0,1]$, with $\Omega = \mathbb{R}$. Then the support is the full interval $[0,1]$ - which is a subset of $\Omega$. But, then, of course, say $x=1/2$ belongs to the support. But the probability that $X$ takes this value is zero.


TL;DR

The support of a r.v. $X$ can be defined as the smallest closed set $R_X \in \mathcal{B}$ such that its probability is 1, as Did pointed out in their comment. An alternative definition is the one given by Stefan Hansen in his comment: the set of points in $\mathbb{R}$ around which any ball (i.e. open interval in 1-D) with nonzero radius has a nonzero probability. (See the section "Support of a random variable" below for a proof of the equivalence of these definitions.)

Intuitively, if any neighbourhood around a point, no matter how small, has a nonzero probability, then that point is in the support, and vice-versa.



I'll start from the beginning to make sure we're using the same definitions.

Preliminary definitions

Probability space

$\newcommand{\A}{\mathcal{A}} \newcommand{\powset}[1]{\mathcal{P}(#1)} \newcommand{\R}{\mathbb{R}} \newcommand{\deq}{\stackrel{\scriptsize def}{=}} \newcommand{\N}{\mathbb{N}}$ Let $(\Omega, \A, \Pr)$ be a probability space, defined as follows:

  • $\Omega$ is the set of outcomes

  • $\A \subseteq \powset{\Omega} $ is the collection of events, a $\sigma$-algebra

  • $\Pr\colon\ \mathbf{\A}\to[0,1]$ is the mapping of events to their probabilities.
    > note: the original version of this answer had the wrong definition
    It has to satisfy some properties:

    • $\Pr(\Omega) = 1$
    • has to be countably additive

Random variable

A random variable $X$ is defined as a map $X\colon\; \Omega \to \R$ such that, for any $x\in\R$, the set $\{\omega \in \Omega \mid X(\omega) \le x\}$ is an element of $\A$; ergo, an element of $\Pr$'s domain to which a probability can be assigned. This condition is necessary in order to define the following concepts.

Cumulative Distribution Function of a r.v.

The probability distribution function (or cumulative distribution function) of a random variable $X$ is defined as the map $$ \begin{align} F_X \colon \quad \R \ &\to\ [0, 1] \\ x\ &\mapsto\ \Pr(X \le x) \deq \Pr(X^{-1}(I_x)) \end{align} $$

We can see that

  • $\Pr(X > x) \deq \Pr(\overline{X^{-1}(I_x)}) = 1 - \Pr(X^{-1}(I_x)) = 1 - F_X(x)$

    where $I_x \deq (-\infty, x]$, and $\overline{A}$ denotes the complement of $A$ in $\Omega$. Notice that this probability is defined since $\A$ is a $\sigma$-algebra (and thus closed under set complement).

  • $\Pr(X < x) \deq \Pr\left(\bigcup\limits_{n\in\N} X^{-1} \left(I_{x-\frac{1}{n}}\right)\right) = \lim\limits_{t \to x^-} \Pr(X \le t) = \lim\limits_{t \to x^-} F_X(t)$

    since $X^{-1} \left(I_{x-\frac{1}{n}}\right) \subseteq X^{-1} \left(I_{x-\frac{1}{n+1}}\right)$ for all $n\in\N$. Note again that that union is valid since it is countable and $\A$ is a $\sigma$-algebra.

  • $\Pr(X = x) \deq \Pr(X^{-1}(I_x) \setminus A_{<x}) = F_X(x) - F(x^-)$

    where $\displaystyle A_{<x} \deq \bigcup_{n\in\N} X^{-1} \left(I_{x-\frac{1}{n}}\right)$ and $F_X(x^-) \deq \lim\limits_{t \to x^-} F_X(t)$.

and so forth.


Probability measure on $\R$ by $X$

Now, the mapping defined by $X$ is sufficient to uniquely define a probability measure on $\R$; that is, a map $$ \begin{align} P_X \colon \quad \mathcal{B} \subset \powset{\R} \ &\to \ [0, 1]\\ A \ &\mapsto \ \Pr(X \in A) \deq \Pr(X^{-1}(A)) \end{align} $$ that assigns to any set $A \in \mathcal{B}$ the probability of the corresponding event in $\A$.

Here $\mathcal{B}$ is the Borel $\sigma$-algebra in $\R$, which is, loosely speaking, the smallest $\sigma$-algebra containing all of the semi-intervals $(-\infty, x]$. The reason why $P_X$ is defined only on those sets is because in our definition we only required $X^{-1}(A) \in \A$ to be true for the semi-intervals of the form $A = (-\infty, x]$; thus $X^{-1}(A)$ is an element of $\A$ only when $A$ is "generated" by those semi-intervals, their complements, and countable unions/intersections thereof (according to the rules of a $\sigma$-algebra).




Support of a random variable

Formal definition

Formally, the support of $X$ can be defined as the smallest closed set $R_X \in \mathcal{B}$ such that $P_X(R_X) = 1$, as Did pointed out in their comment.

An alternative but equivalent definition is the one given by Stefan Hansen in his comment:

The support of a random variable $X$ with values in $\R^n$ is the set $\{x\in \R^n \mid P_X(B(x,r))>0, \text{ for all } r>0\}$ where $B(x,r)$ denotes the ball with center at $x$ and radius $r$. In particular, the support is a subset of $\R^n$.

The equivalence can be proven as follows:

Proof
Let $R_X$ be the smallest closed set $R_X \in \mathcal{B}$ such that $P_X(R_X) = 1$. That means that for every $x \in \overline{R_X}$, there exists a radius $r\in\R_+$ such that the open interval (or open ball in the more general case) $(x-r, x+r)$ is contained within $\R \setminus R_X$ (since $R_X$ is closed).

That, in turn, implies that $P_X((x-r, x + r)) = 0$ --- otherwise, $P_X(R_X \cup (x-r,x+r)) = P_X(R_X) + P_X((x-r, x+r)) > P_X(R_X) = 1$, a contradiction.

Conversely, suppose $P_X((x-r, x+r)) = 0$ for some $x\in\R$, $r\in\R_+$. Then $(x-r, x+r) \subseteq \R \setminus R_x$. Otherwise $R_X' \deq R_X \setminus (x-r, x+r)$ would be a closed set smaller than $R_X$ satisfying $P_X(R_X') = 1$.

This proves $\R \setminus R_X = \{x\in\R \mid \exists r \in \R_+\colon P_X((x-r, x+r)) = 0\}$

Negating the predicate, one gets $R_X = \{x\in\R \mid \forall r \in \R_+ P_X((x-r, x+r)) > 0\}$

But more often, different definitions are given.


Alternative definition for discrete random variables

A discrete random variable can be defined as a random variable $X$ such that $X(\Omega)$ is countable (either finite or countably infinite). Then, for a discrete random variable the support can be defined as

$$R_X \deq \{x\in\R \mid \Pr(X = x) > 0\}\,.$$

Note that $R_X \subseteq X(\Omega)$ and thus $R_X$ is countable. We can prove this by proving its contrapositive:

Suppose $x \in \R$ and $x \notin X(\Omega)$. We can distinguish two cases: either $x < y$ $\forall y \in R_X$, or $x > y$ $\forall y \in R_X$, or neither.

Suppose $x < y$ $\forall y \in R_X$. Then $\Pr(X = x) \le \Pr(X \le x) = \Pr(X^{-1}(I_x)) = \Pr(\emptyset) = 0$, since $\forall \omega\in\Omega\ X(\omega) > x$. Ergo, $x\notin R_X$.

The case in which $x > y$ $\forall y \in X(\Omega)$ is analogous.

Suppose now $\exists y_1, y_2 \in X(\Omega)$ such that $y_1 < x < y_2$. Let $S = \{y\in X(\Omega) \mid y < x\}$, which is. Thus $\sup L$ and exists, and $\lim_{y \to x^-} F_X(y) = F_X(\sup L)$ since $F_X$ is nondecreasing and bounded above. Thus, since $\sup L \le x$, $F_X(x) \ge F_X(\sup L)$ and therefore $\Pr(X=x) = F_X(x) - F_X(x^-) \ge F_X(\sup L) - F_X(x^-) = 0$.


Alternative definition for continuous random variables

Notice that for absolutely continuous random variables (that is, random variables whose distribution function is continuous on all of $\R$), $\Pr(X = x) = 0$ for all $x\in \R$—since $F_X(x) = F_X(x^-)$. But that doesn't mean that the outcomes of $X^{-1}({x})$ are "impossible", informally speaking. Thus, in this case, the support is defined as

$$ R_X = \{x \in \R \mid f_X(x) > 0\}\,,$$

which intuitively can be justified as being the set of points around which we can make an arbitrarily small interval on which the integral of the PDF is always strictly positive.


The support of the density function $f_X(.)$ is the range of values of the random variable X for which the density function is positive. That is,

$\mathcal{R}_x:= \{{x\in \mathcal{R}_X : f_X(x) > 0\}}$

Note that $f_X(.)$ is the probability density/mass function of the random variable X.