Does convergence in law to absolutely continuous limit imply convergence in convex distance?

What is essential here is that the distribution of $X$ assigns little mass to sets which are essentially $(d-1)$-dimensional.


The standard approach to problems of this kind is to estimate $$ \operatorname{P}(X_n \in K) - \operatorname{P}(X \in K) $$ from above by $$ \operatorname{E}(g(X_n)) - \operatorname{E}(f(X)) , $$ and from below by $$ \operatorname{E}(f(X_n)) - \operatorname{E}(g(X)) , $$ where $0 \leqslant f \leqslant \mathbb{1}_K \leqslant g \leqslant 1$, and $f$ and $g$ are continuous. In $k$-th step we choose $f$ and $g$ in such a way that $\operatorname{E}(g(x) - f(x)) < \tfrac{1}{2 k}$. Convergence of $\operatorname{E}(f(X_n))$ to $\operatorname{E}(f(X))$ and convergence of $\operatorname{E}(g(X_n))$ to $\operatorname{E}(g(X))$ imply that $$ -\tfrac{1}{k} \leqslant \operatorname{P}(X_n \in K) - \operatorname{P}(X \in K) \leqslant \tfrac{1}{k} $$ for all $n$ large enough.

This works as expected, i.e. leads to convergence of $\operatorname{P}(X_n \in K)$ to $\operatorname{P}(X \in K)$, if (and only if) $\operatorname{P}(X \in \partial K) = 0$: then (and only then) it is possible to choose $f$ and $g$ with the desired property.


Now in order to get uniform convergence for a class of sets $K$ — here the class of convex subsets of $\mathbb{R}^d$ — in every step $k$ we should choose $f$ and $g$ from a fixed finite-dimensional set of functions (which, of course, can well depend on $k$). One solution for the class of convex $K$ is as follows.

By tightness, there is $R > 1$ such that $\operatorname{P}(|X_n| > R) < \tfrac{1}{4 k}$ uniformly in $n$. We will choose (small) $\delta \in (0, 1)$ at a later stage. We cover the ball $\overline{B}(0, R)$ using $d$-dimensional cubes $Q_j$ with edge length $\delta$ and vertices at lattice points $(\delta \mathbb{Z})^d$. To be specific, suppose that the cubes $Q_j$ are open sets. Let $h_j = \mathbb{1}_{Q_j}$ be the indicator of $Q_j$. Then $$h_1 + \ldots + h_J = 1 \quad \text{a.e. on $\overline{B}(0, R)$}$$ (more precisely: everywhere on $\overline{B}(0, R)$, except possibly on the faces of cubes $Q_j$). We add $h_0 = 1 - \sum_{j = 1}^J h_j$ to this collection. Observe that $h_0 = 0$ a.e. on the complement of $\overline{B}(0, R)$, and hence $\operatorname{E}(h_0(X)) \leqslant \operatorname{P}(|X_n| > R) < \tfrac{1}{4 k}$. Furthermore, by the first part of this answer, we already know that $\operatorname{E}(h_j(X_n)) = \operatorname{P}(X_n \in Q_j)$ converges to $\operatorname{E}(h_j(X)) = \operatorname{P}(X \in Q_j)$ as $n \to \infty$ for every $j = 0, 1, \ldots, J$ (because the distribution of $X$ does not charge the boundaries of $Q_j$).

Given a convex set $K$, we define $f$ to be the sum of all $h_j$ corresponding to cubes $Q_j$ contained in $K$, and $g$ to be the sum of all $h_j$ corresponding to cubes $Q_j$ which intersect $K$. Clearly, $$0 \leqslant f \leqslant \mathbb{1}_K \leqslant g \leqslant 1 \qquad \text{a.e.}$$ As in the first part of the proof, $$ \operatorname{E}(f(X_n)) - \operatorname{E}(g(X)) \leqslant \operatorname{P}(X_n \in K) - \operatorname{P}(X \in K) \leqslant \operatorname{E}(g(X_n)) - \operatorname{E}(f(X)) . $$ For $n$ large enough we have $$ \sum_{j = 0}^J |\operatorname{E}(h_j(X_n)) - \operatorname{E}(h_j(X))| \leqslant \tfrac{1}{2 k} , $$ and so $$ |\operatorname{E}(f(X_n)) - \operatorname{E}(f(X))| \leqslant \tfrac{1}{2 k} , \qquad |\operatorname{E}(g(X_n)) - \operatorname{E}(g(X))| \leqslant \tfrac{1}{2 k} .$$ Thus, $$ -\tfrac{1}{2k} - \operatorname{E}(g(X) - f(X)) \leqslant \operatorname{P}(X_n \in K) - \operatorname{P}(X \in K) \leqslant \tfrac{1}{2k} + \operatorname{E}(g(X) - f(X)) $$ for $n$ large enough, uniformly with respect to $K$. It remains to choose $\delta > 0$ such that $\operatorname{E}(g(X) - f(X)) < \tfrac{1}{2k}$ uniformly with respect to $K$; once this is proved, we have $$ -\tfrac{1}{k} \leqslant \operatorname{P}(X_n \in K) - \operatorname{P}(X \in K) \leqslant \tfrac{1}{k} $$ for $n$ large enough, uniformly with respect to $K$, as desired.

By definition, $g - f$ is the sum of some number of functions $h_j$ with $j \geqslant 1$ — say, $m$ of them — and possibly $h_0$. Recall that $\operatorname{E}(h_0(X)) \leqslant \operatorname{P}(|X_n| > R) < \tfrac{1}{4 k}$. It follows that $$ \operatorname{E}(g(X) - f(X)) \leqslant \tfrac{1}{4 k} + \sup \{ \operatorname{P}(X \in A) : \text{$A$ is a sum of $m$ cubes $Q_j$} \} . \tag{$\heartsuit$} $$ We now estimate the size of $m$.

Lemma. For a convex $K$, the number $m$ defined above is bounded by a constant times $(R / \delta)^{d - 1}$.

Proof: Suppose that $Q_j$ intersects $K$, but it is not contained in $K$. Consider any point $z$ of $K \cap Q_j$, and the supporting hyperplane $$\pi = \{x : \langle x - z, \vec{u} \rangle = 0\}$$ of $K$ at that point. We choose $\vec{u}$ in such a way that $K$ is contained in $\pi^- = \{x : \langle x - z, \vec{u} \rangle \leqslant 0\}$. If the boundary of $K$ is smooth at $z$, then $\vec{u}$ is simply the outward normal vector to the boundary of $K$ at $z$.

To simplify notation, assume that $\vec{u}$ has all coordinates non-negative. Choose two opposite vertices $x_1, x_2$ of $Q_j$ in such a way that $\vec{v} = x_2 - x_1 = (\delta, \ldots, \delta)$. Then the coordinates of $x_2 - z$ are all positive. It follows that for every $n = 1, 2, \ldots$, all coordinates of $(x_1 + n \vec{v}) - z = (x_2 - z) + (n - 1) \vec{v}$ are non-negative, and therefore the translated cubes $Q_j + n \vec{v}$ all lie in $\pi^+ = \{x : \langle x - z, \vec{u} \rangle \geqslant 0\}$. In particular, all these cubes are disjoint with $K$.

In the general case, when the coordinates of $\vec{u}$ have arbitrary signs, we obtain a similar result, but with $\vec{v} = (\pm \delta, \ldots, \pm \delta)$ for some choice of signs. It follows that with each $Q_j$ intersecting $K$ but not contained in $K$ we can associate the directed line $x_2 + \mathbb{R} \vec{v}$, and this this line uniquely determines $Q_j$: it is the last cube $Q$ with two vertices on this line that intersects $K$ (with "last" referring to the direction of the line).

It remains to observe that the number of lines with the above property is bounded by $2^d$ (the number of possible vectors $\vec{v}$) times the number of points in the projection of $(\delta \mathbb{Z})^d \cap \overline{B}(0, R)$ onto the hyperplane perpendicular to $\vec{v}$. The latter is bounded by a constant times $(R / \delta)^{d - 1}$, and the proof is complete. $\square$

(The above proof includes simplification due to Iosif Pinelis.)

Since the Lebesgue measure of $Q_j$ is equal to $\delta^d$, the measure of $A$ in ($\heartsuit$) is bounded by $m \delta^d \leqslant C R^{d - 1} \delta$ for some constant $C$. Furthermore, since the distribution of $X$ is absolutely continuous, we can find $\delta > 0$ small enough, so that $\operatorname{P}(X \in A) < \tfrac{1}{4 k}$ for every set $A$ with measure at most $C R^{d - 1} \delta$ (recall that $R$ was chosen before we fixed $\delta$). By ($\heartsuit$), we find that $$ \operatorname{E}(g(X) - f(X)) \leqslant \tfrac{1}{4 k} + \sup \{ \operatorname{P}(X \in A) : |A| \le C R^{d - 1} \delta\} \leqslant \tfrac{1}{2 k} , $$ uniformly with respect to $K$.