Why is the Barycenter operation in Hadamard spaces Lipschitz continuous?

Here is a proof in the setting of complete simply connected Riemannian manifolds of nonpositive sectional curvature. The case of general CAT(0) spaces should be similar, but there are some technical details which make the discussion less pleasant in that case: One needs to smooth out convex functions of one variable (which can be done via convolution with standard mollifiers), one also has to address the issue that geodesics in general CAT(0) spaces are not extendible as well as discuss the 1-st and 2nd variation formulae for the lengths of geodesics in terms of Alexandrov angles (instead of Riemannian ones). Since the solution is already much longer than I would have liked, I will not pursue the general CAT(0) case.

Another remark is that the same proof with very minor modifications goes through if one puts some fixed probability measures on the finite sets in question and considers barycenters of these measures.

Thus, let $M$ be a Riemannian manifold as above, $P=(p_1,...,p_n)$, $Q=(q_1,...q_n)$. Let $b_P, b_Q$ denote their respective barycenters. The first thing to note is that if we rescale the metric on $M$, the locations of the barycenters do not change and the nonpositive curvature condition is also retained. Furthermore, the Lipschitz constant of the barycenter map will not change. Therefore, after rescaling, we can (and will) assume that $$ \max_{i\ne j} d(p_i, p_j)= d(p_1, p_2)=1. $$ The barycenter $b_P$ of $P$ lies in the convex hull of $P$ (which has diameter $1$), hence, its distance to all the points $p_i$ is at most $1$. Since the Lipschitz property is a local condition, it suffices to consider the case when $$ \max_{i} d(p_i,q_i) $$ is small, say, $\le \frac{1}{10}$. Define convex functions $\Phi_i(x):= d(p_i, x)$, $\Psi_i(x)= d(q_i,x)$, $i=1,...,n$ on $M$ and $$ F(x)= \sum_i \Phi^2_i(x), G(x)= \sum_i \Psi^2_i(x). $$ We consider the following interpolation between the functions $F$ and $G$: $$ H(x,t)= \sum_i ( (1-t) \Phi_i + t \Psi_i)^2. $$

Note. When I first tried to use the convex interpolation between $F$ and $G$, the argument did not work.

Consider the complete geodesic $\gamma$ in $M$ passing through the barycenters of $P$ and $Q$ respectively and restrict $\Phi_i$'s, $\Psi_i$'s and $H$ to this geodesic. The restriction of $H(\cdot, t)$ is proper strictly convex function which I will denote $h(s,t)$. For each $t$ let $s_t$ denote the point of unique minimum of $h(s,t)$ on ${\mathbb R}$; then $\gamma(s_0)$ and $\gamma(s_1)$ are the barycenters of $F$ and $G$ respectively. Let $s_0$ and $s_1\in {\mathbb R}$ be such that $\gamma(s_0)=b_P$, $\gamma(s_1)=b_Q$.

I will get uniform Lipschitz bound on the function $t\mapsto s_t, 0\le t\le 1$, and use it to get a uniform Lipschitz bound on the function $$ (P,Q)\mapsto d(b_P, b_Q). $$ (Of course, the global minumum of $H(t, \cdot)$ need not lie on the geodesic $\gamma$ for $t\ne 0, 1$.)

Remark. There is a slightly unpleasant technical issue here, namely that $h(s,t)$ is smooth except, possibly, on a finite union of lines in ${\mathbb R}^2$, this lack of smoothness happens when one of the points $p_i, q_j$ lies on the geodesic $\gamma$. We will address this issue by getting a uniform bound away from these lines and integrating on the interval $0\le t\le 1$.

Working away from these lines, we now have the smooth function $h(s,t)$. Its point of minimum $s_t$ is given by the equation $$ \frac{\partial h}{\partial s}(s_t)=0. $$ By the implicit function theorem, we obtain $$ |\frac{ds_t}{t}|= |\frac{\partial Dh}{\partial t}(s_t)| \cdot |\frac{\partial D^2h}{\partial s^2}(s_t)|^{-1} $$ Thus, our goal is to bound $|\frac{\partial Dh}{\partial t}(s_t)|$ from above and $\frac{\partial D^2h}{\partial s^2}(s_t)$ from below, away from $0$ (it is already positive by convexity, the issue is a uniform positive lower bound). Note that since $b_P, b_Q$ and, hence, the geodesic segment $b_Pb_Q$ are contained in the union of the convex hulls of $P$ and $Q$ and which has diameter $\le 2$, it follows that

An upper bound for the numerator: $$ \frac{\partial Dh}{\partial t} = 2 \sum_i ((1-t)\phi_i(s) + t \psi_i(s))(\psi_i(s) - \phi_i(s)). $$ Here and below, $\phi_i(s)= \Phi_i(\gamma(s))$, $\psi_i(s)= \Psi_i(\gamma(s))$. Since the distance functions are $1$-Lipschitz, we obtain: $$ |\psi_i(s) - \phi_i(s)|\le d(p_i,q_i), i=1,...,n. $$ On the other hand, since $b_Pb_Q$ is contained in a subset of diameter $\le 2$ in $M$, we obtain $$ \phi_i(s)\le 2, \psi_i(s)\le 2 $$ for all $s\in [s_0, s_1]$. Hence, for $s\in [s_0, s_1]$, $$ |\frac{\partial h}{\partial t}|\le 2n \max_i d(p_i,q_i). $$
Lower bound for the denominator. We have: $$ \frac{\partial D^2h}{\partial s^2}= 2\sum_i [ ((1-t)\phi_i' + t \psi_i'))^2 + ((1-t)\phi_i + t \psi_i)) ((1-t)\phi_i'' + t \psi_i'')) $$ Each $i$-th term of this sum is $\ge 0$, thus, it suffices to bound the sum of the first two terms ($i=1, 2$) from below. By the triangle inequality, for each $s$, $$d(\gamma(s), p_1) + d(\gamma(s), p_2)\ge d(p_1, p_2)=1$$ and $$ d(\gamma(s), q_1) + d(\gamma(s), q_2) \ge 0.8, $$ (since we assumed that $\max_i d(p_i, q_i)\le 0.1$). Therefore, it suffices to consider those values of $s$ for which $d(\gamma(s), p_1)\ge 0.5$ and $d(\gamma(s), q_1)\ge 0.4$ and bound below the first term ($i=1$), otherwise, we will be bounding the second term.

To estimate the terms $\phi_1'(s)$ and $\psi_1'(s)$ we will use the 1st variation formula. Let $\alpha=\alpha_1(s)$ denote the angle $\angle p_1 \gamma(s) b_P$; let $\beta=\beta_1(s)$ denote the angle $\angle q_1 \gamma(s) b_P$. The first variation formula for the length of the segment $p_1\gamma(s)$ then reads $$ \phi_1'(s)= \frac{1}{2} \cos(\alpha)/d(p_1,\gamma(s))= \frac{1}{2} \cos(\alpha)/\phi_1(s) $$ and, similarly, $$ \psi_1'(s)= \frac{1}{2} \cos(\beta)/\psi_1(s). $$ We also have $\phi_1(s), \psi_1(s)\in [0,2]$. Since $d(p_1,q_1)\le 0.1$ and $d(\gamma(s), p_1)\ge 0.5, d(\gamma(s), q_1)\ge 0.4$, we obtain a bound on the difference $|\alpha-\beta|$: $$ |\sin((\alpha- \beta)/2)|\le 1/8, $$ $$ |\alpha- \beta|\le 0.06 \approx \pi/25. $$ Thus, as long as, say, $\alpha\in [0, \pi/6]\cup [\pi- \pi/6, \pi]$, we obtain the uniform lower bound $$ ((1-t)\phi_1' + t \psi_1'))^2 \ge \frac{1}{4}>0. $$

Thus, consider the case when $$\alpha\in [\pi/6, \pi- \pi/6], \beta\in [\pi/12, \pi- \pi/12].$$ Consider the term $$ ((1-t)\phi_1 + t \psi_1)) ((1-t)\phi_1'' + t \psi_1''). $$ We have: $$ \phi_1(s)\ge 0.5, \psi_1(s)\ge 0.4 $$ and $$ (1-t)\phi_1(s) + t \psi_1(s)\ge 0.4. $$ It remains to bound $((1-t)\phi_1'' + t \psi_1'')$. Taking into account that $M$ has nonpositive sectional curvature, we obtain: $$ \phi_1''(s)\ge \frac{\sin^2(\alpha)}{d(p_1, \gamma(s))} \ge 0.5 \sin^2(\alpha) \ge \frac{1}{8} $$ and $$ \phi_1''(s)\ge \frac{\sin^2(\beta)}{d(q_1, \gamma(s))} \ge 0.5 \sin^2(\beta) \ge \frac{1}{32}. $$ Thus, $$ ((1-t)\phi_1 + t \psi_1)) ((1-t)\phi_1'' + t \psi_1'') \ge \frac{0.4}{32}. $$

In either case, we obtain a uniform upper bound
$$ |\frac{ds_t}{t}| \le 80n \max_i d(p_i,q_i) $$ By integration on $[0,1]$, we obtain: $$ d(b_P, b_Q)= |s_1 - s_0|\le 80n \max_i d(p_i,q_i) $$ This is the required Lipschitz property of barycenters. qed

Edit. For anybody who might still care (and for myself if I cannot find this reference in the future):

K-T.Sturm proved a general (and much sharper) result about Lipschitz property of barycenter map for probability measures on $CAT(0)$ spaces $X$:

Theorem. (Theorem 6.3 in [1]) Let $B(\mu)$ denote the barycenter of a probability measure $\mu$ on $X$. Then $$ d(B(\mu_1), B(\mu_2))\le d^{W}(\mu_1, \mu_2) $$ where $d^W$ is the Wasserstein distance between the measures.

[1] K.-T. Sturm. Probability measures on metric spaces of nonpositive curvature. In Heat kernels and analysis on manifolds, graphs, and metric spaces. Contemp. Math. 338 (2003), 357-390

(1) In a Hardmard space $(X ,|\ |)$, define a convex set $K$ wrt $|\ |$ Then then we have $$ f : X\rightarrow \mathbb{R},\ f(x)=|xK|\ ({\rm or}\ |xK|^2) $$ Note that $f$ attains a minimum at $x_0$ and $x_0$ is unique So we have $$ \pi : X\rightarrow K,\ \pi (x)=x_0 $$

Then $\pi$ is distance nonexpanding

Proof : Note that $$\angle (xx_0,y_0x_0),\ \angle (yy_0,x_0y_0) > \frac{\pi}{2}$$

If ${\rm geod}_{[ab]}(t)$ is unit speed geodesic from $a$ to $b$, then we have first variation formula, since $X$ is Hadamard space, $$ |{\rm geod}_{[x_0x]} (t)- {\rm geod}_{[y_0y]} (\tau ) | = |x_0y_0| -t\cos\ \angle (xx_0,y_0x_0) -\tau \cos\ \angle (yy_0,x_0y_0) + o(t+\tau ) $$ Hence we complete the proof

(2) Now we will apply to our case : If we have $(H,d)$, then define $$ (X,|\ |)= \underbrace{(H,d)\times \cdots \times (H,d)}_{n\ times} $$ where $$ |(p_i)(p_i')| := \sqrt{ \sum_i d(p_i,p_i')^2 } $$

Then we have a convex set $K:=\Delta H$ And let $f((p_i)):=|(p_i)K|=\inf_{x\in H}\ \sqrt{\sum_i d(p_i,x)^2} $

By (1) since $\pi$ is distance nonexpanding so $$ \pi:X\rightarrow K,\ \pi((x_i))=(a_i),\ a_i=x_0,\ \pi((y_i))=(b_i),\ b_i=y_0$$ $$\sqrt{n}d(x_0,y_0) \leq \sqrt{\sum_i d(x_i,y_i)^2} $$

This complete the proof

Why is the Barycenter operation in Hadamard spaces Lipschitz continuous?

Tags:

Convex Optimization

Convex Geometry

Convex Analysis

Metric Geometry

Related

Recent Posts