The "muscle" behind the fact that ergodic measures are mutually singular

This problem troubled me for a long time. The BET is literally everywhere in ergodic theory and there are many situations where it is not immediately clear how necessary the BET really is. Below, I'll show that we don't need the BET here, just some ideas from martingale theory.

Motivation

The trouble with noninvertible measure preserving systems is that $T^{-1} \mathcal B \subsetneq \mathcal B$. A good way to think about this is that the action of $T$ coarsens phase space: more precisely, for a measurable observable $\phi : X \to \mathbb R$, we have that $\phi \circ T$ is $T^{-1} \mathcal B$-measurable.

For $f = d \nu / d \mu$, what you've shown already is that $$ \int_{T^{-1} E} f d \mu = \int_{T^{-1} E} f \circ T $$ for all $E \in \mathcal B$. Let's reformulate this in terms of conditional expectation: $$ \mu (f | T^{-1} \mathcal B ) = f \circ T \, , $$ where $\mu( \cdot | \cdot)$ denotes conditional expectation. You can think of this as saying that $f = d \nu / d \mu$ is "invariant on average". Naturally this does not mean $f$ is invariant yet, since $T^{-1} \mathcal B$ may be quite coarse when compared to $\mathcal B$.

Using the tower property of conditional expectation, one can show $$ \mu(f | T^{-n} \mathcal B) = f \circ T^n $$ for all $n \geq 0$. What's really astonishing is that the LHS converges almost surely, as I'll show below.

Reverse martingales

Definition: Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $\mathcal F_n, n \geq 1$ be a sequence of $\sigma$-subalgebras for which $\mathcal F_n \supset \mathcal F_{n+1}$ for all $n$. A sequence of $L^1$ random variables $(X_n)$ is called a reverse martingale if $\mathbb E(X_{n+1} | \mathcal F_n) = X_n$ for all $n$.

Note that $X_n = \mathbb E(X_1 | \mathcal F_n)$, so $X_n$ is $L^1$ for all $n$ if $X_1$ is. This is what makes reverse martingales so nice: they're all Levy martingales automatically, hence automatically uniformly integrable (unlike those pesky forward martingales, convergence theorems for which require a lot of care to check $L^1$ convergence).

Theorem: Let $\{X_n\}$ be a reverse martingale with respect to $(\mathcal F_n)$. Then, $X_n$ converges almost surely and in $L^1$ to $X_\infty = \mathbb E(X_1 | \mathcal F_{\infty})$, where $\mathcal F_\infty = \cap_{i = 1}^\infty \mathcal F_i$.

Since $Y_n = \mu(f | T^{-n} \mathcal B)$ is a (reverse) Levy martingale, it converges almost surely to $\mu(f | \mathcal B_\infty)$ where $\mathcal B_\infty = \cap_{i = 0}^\infty T^{-i} \mathcal B$.

Proving $f$ is $T$-invariant

Write $f_\infty = \mu(f | \mathcal B_\infty)$, noting that $f \circ T^n$ converges to $f_\infty$ in measure ($\mu$).

To prove $f$ is $T$-invariant, let $I \subset \mathbb R$ be an interval. We will prove that for `most' intervals, $$D_I := \mu\bigg( \{ f \in I \} \Delta \{ f \circ T \in I \} \bigg) = 0,$$ where $\Delta$ denotes symmetric difference.

Fix $\epsilon > 0$ and let $n$ be sufficiently large so that $\mu(|f\circ T^n - f_\infty| > \epsilon) < \epsilon$. Then

$$ D_I = \mu \big( \{ f \circ T^n \in I\} \Delta \{ f \circ T^{n+1} \in I \} \big) \leq \epsilon + \mu\{f \circ T^n \in I\} \cap \{ f \circ T^{n+1} \in I_\epsilon \setminus I \} +\mu\{f \circ T^{n+1} \in I\} \cap \{ f \circ T^{n} \in I_\epsilon \setminus I \} $$ where $I_\epsilon$ is the `fattening' of $I$ by $\epsilon$ ($I_\epsilon = [a - \epsilon, b + \epsilon]$ where $I = [a,b]$). Pulling back, we have shown that

$$ D_I \leq \epsilon + \mu\{f \in I\} \cap \{ f \circ T \in I_\epsilon \setminus I \} +\mu\{f \circ T \in I\} \cap \{ f \in I_\epsilon \setminus I \} $$ for any $\epsilon > 0$. Taking $\epsilon \to 0$ and assuming $I = [a,b]$, where $a,b$ are chosen from the (at worst co-countable) set of points c for which $\mu \{ f = c\} = 0, \mu\{ f \circ T = c \} = 0$, we conclude that $D_I = 0$. This implies invariance of $f$.


A Blumenthal has already given a neat answer, but there is a more elementary argument for the invariance of $f$ that requires neither the ergodic theorem nor the backward martingale convergence theorem. This is shown in Peter Walters book (Theorem 6.10).

First, note that for every measurable $E$, \begin{align*} \mu(T^{-1}E\setminus E) &= \mu(T^{-1}E)-\mu(T^{-1}E\cap E) \\ &= \mu(E) - \mu(T^{-1}E\cap E) \\ &= \mu(E\setminus T^{-1}E) \;. \end{align*} This is true for every invariant measure, in particular, \begin{align*} \nu(T^{-1}E\setminus E) &= \nu(E\setminus T^{-1}E) \;. \end{align*} for every measurable $E$.

Now, for $r>0$, let $E_r:=\{x: f(x)<r\}$. Then, \begin{align*} \int_{T^{-1}E_r\setminus E_r}f\,\mathrm{d}\mu &= \int_{E_r\setminus T^{-1}E_r}f\,\mathrm{d}\mu \;. \end{align*} Observe that $f\geq r$ on $T^{-1}E_r\setminus E_r$ and $f<r$ on $E_r\setminus T^{-1}E_r$. Therefore, $\mu(T^{-1}E_r\setminus E_r)=\mu(E_r\setminus T^{-1}E_r)=0$.

In words, this says that for every $r>0$, the set of points $x$ such that either $f(Tx)<r\leq f(x)$ or $f(x)<r\leq f(Tx)$ has $\mu$-measure $0$, and this means $f(x)$ and $f(Tx)$ must agree almost everywhere.

(More precisely, \begin{align*} \mu\left(\{x: f(Tx)<f(x)\}\right) &\leq \sum_{r\in\mathbb{Q}^+} \mu(T^{-1}E_r\setminus E_r) = 0 \end{align*} and similarly, $\mu\left(\{x: f(x)<f(Tx)\}\right)=0$. Hence, $f\circ T=f$ $\mu$-almost everywhere.)