Gradient of $\mbox{dist}\left(x, D \right)^2:= \left\| x - P_{D}(x)\right\|_2^2$, where $P_{D}(x)$ is a projection operator

There is a fast way of proving this as a corollary of the result $$\nabla(M_{\gamma f})=\gamma^{-1}(\textrm{Id}-\textrm{prox}_{\gamma f}),\tag{*}$$ where $\gamma\in\mathbb{R}_{++}$ and $M_{\gamma f}$ is the Moreau Envelope of a proper, lower-semicontinuous, convex function $f:\mathbb{R}^n\to]-\infty,+\infty]$. This result appears in Corollary 12.31 of Bauschke & Combettes' book, vol. 2. The argument essentially states that if you let $\gamma=1/2$ and let $f$ be the $0$-$\infty$ indicator function of the set $D$, then $M_{\gamma f}=\textrm{dist}^2_D/2$ and $\textrm{prox}_{\gamma f}=P_D$. Then you just multiply (*) to get the factor of $2$.

I'd be interested to see a more direct proof using less "heavy-duty" machinery.


Here is a proof using non differentiable calculus.

Let $d_D(x) = \min_{d \in D} \|x-d\|^2$. The $\min$ is attained at a unique point $P_D(x)$ because $D$ is closed & convex.

If we pick some $x^*$ and restrict $x$ to the closed ball $\overline{B}(x^*,1)$, we can assume that $D$ is compact. To see this, pick $R=\sqrt{d_D(x^*)}+1$ and let $D' = D \cap \overline{B}(x^*,R)$. Then $d_D(x) \le \|x-P_D(x^*)\|^2 \le (\|x-x^*\| + \sqrt{d_D(x^*)})^2 \le R^2$. In particular, $P_D(x) \in D'$ and so, locally, $d_D(x) = d_{D'}(x)$, so we may assume that $D$ is bounded and hence compact.

We can write $d_D(x) = - g(x)$, where $g(x)=\max_{d \in D} \phi(x,d)$ and $\phi(x,d) = - \|x-d\|^2 $. Since $g$ is locally Lipschitz it has a (Clarke) generalised gradient and we can compute it by $\partial g(x) = \operatorname{co} \{ { \partial \phi(x,d) \over \partial x} \}_{d \in I(x)}$ with $I(x) = \{ d \in D | \phi(d,x) = g(x) \}$. Since the maximiser is unique, it turns out that $g$ is differentiable and ${\partial g(x) \over \partial x} = { \partial \phi(x,P_D(x)) \over \partial x} = - 2(x-P_D(x))^T$. Hence $d_D$ is differentiable and ${\partial d_D(x) \over \partial x} = 2(x-P_D(x))^T$.


Here is a tedious but elementary proof.

Note that the projection is Lipschitz with rank one, that is $\|P_D(x)-P_D(y)\| \le \|x-y\|$ (see here for example).

Note that $f(y) \le \|y-P_D(x)\|^2 = \|x-P_D(x)+y-x\|^2 = f(x) + 2(x-P_D(x))^T(y-x) +\|y-x\|^2$ so we have $f(y)-f(x) - 2(x-P_D(x))^T(y-x) \le \|y-x\|^2$.

Swapping $x,y$ we get $-(f(y)-f(x) - 2(y-P_D(y))^T(y-x)) \le \|y-x\|^2$.

Note that $y-P_D(y) = x-P_D(x) + y-x+P_D(x)-P_D(y)$, so the above becomes \begin{eqnarray} -(f(y)-f(x) - 2(x-P_D(x))^T(y-x)) &\le & \|y-x\|^2-2(y-x+P_D(x)-P_D(y))(y-x) \\ &\le& 4 \|y-x\|^2 \end{eqnarray} In particular, $f$ is differentiable at $x$ and $D f(x)h = 2(x-P_D(x))^T h$.