Convexity and Lipschitz continuity

That's a standard result in convex optimization. For example Theorem 2.1.5 in Nesterov's "Introductory Lectures on Convex Optimization" states that the following are equivalent:

  • $f$ is $C^1$, convex and the gradient $\nabla f$ is $L$-Lipschitz
  • for all $x,y$: $0\leq f(y) - f(x) - \langle\nabla f(x),y-x\rangle \leq \tfrac{L}2 \|x-y\|^2$
  • for all $x,y$: $\tfrac1L\|\nabla f(x)-\nabla f(y)\|^2 \leq \langle\nabla f(x)-\nabla f(y),x-y\rangle$
  • for all $x,y$: $\langle\nabla f(x)-\nabla f(y),x-y\rangle \leq L\|x-y\|^2$

(In case you are interested: The proof there is directly for $C^1$ functions and no second derivatives are used at intermediate steps.)


Yes

Consider first the case where $f\in{\cal C}^2$. Then $$\nabla f(y)-\nabla f(x)=\int_0^1{\rm D}^2f(x+t(y-x))\cdot(y-x)\,dt.$$ There follows $$\|\nabla f(y)-\nabla f(x)\|\le\|y-x\|\int_0^1\|{\rm D}^2f(x+t(y-x))\|\,dt.$$ Now, the assumption tels you that $\|{\rm D}^2f(x+t(y-x))\|\le L$, whence the result.

Now the general case can be obtained by a density argument. Let a convex function $f$ satisfy your assumption. For $\epsilon>0$, et us define a smooth convex function $f_\epsilon$ by inf-convolution: $$f_\epsilon(x)=\inf_z(f(z)+\frac1\epsilon\,\|x-z\|^2).$$ Apply the result to $f_\epsilon$, then pass to the limit as $\epsilon\rightarrow0$.


This answer is a small modification of the answer of Denis Serre. I added for reader's convenience: (1) the result is slightly more general; (2) the answer contains much more details; (3) I am using a convolution by mollification approximation instead of inf-convolution.

Since convex functions satisfy $$ \langle \nabla f(x)-\nabla f(y),x-y\rangle\geq 0, $$ it suffices to prove the following more general result.

Theorem. Let $f\in C^1(\mathbb{R}^n)$ and let $L>0$.Then the following conditions are equivalent: \begin{equation} (1)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\langle\nabla f(x)-\nabla f(y),x-y\rangle|\leq L|x-y|^2 \quad \text{for all $x,y\in\mathbb{R}^n$.} \end{equation} \begin{equation} (2)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\nabla f(x)-\nabla f(y)|\leq L|x-y| \quad \text{for all $x,y\in\mathbb{R}^n$.} \end{equation}

Proof. While the implication (2) to (1) is obvious the other is not so we will prove the implication from (2) to (1) now. Assume first that $f\in C^\infty(\mathbb{R}^n)$. For $|u|=1$, (1) yields, $$ \left|\left\langle\frac{\nabla f(x+tu)-\nabla f(x)}{t},u\right\rangle\right|\leq L, $$ so passing to the limit as $t\to 0$ gives $$ |\langle D^2f(x)u,u\rangle|\leq L. $$ Since $D^2 f(x)$ is a symmetric matrix, the spectral theorem implies that the operator norm of the matrix $D^2f(x)$ satisfies $$ \Vert D^2f(x)\Vert = \sup_{|u|=1}|\langle D^2f(x)u,u\rangle|\leq L. $$ This estimate however, easily implies the result \begin{equation} \begin{split} & |\nabla f(x)-\nabla f(y)|= \left|\int_0^1\frac{d}{dt}\nabla f(y+t(x-y))\, dt\right|\\ &\leq |x-y|\int_0^1\Vert D^2f(y+t(x-y))\Vert\, dt\leq L|x-y|. \end{split} \end{equation} This completes the proof when$f\in C^\infty$. Assume now that $f\in C^1$ and let $f_\epsilon=f*\varphi_\epsilon$ be a standard approximation by convolution. Recall that $f_\epsilon\in C^\infty$ and $\nabla f_\epsilon=(\nabla f)*\varphi_\epsilon$. We have \begin{equation} \begin{split} & |\langle \nabla f_\epsilon(x)-\nabla f_\epsilon(y),x-y\rangle|= \Big|\Big\langle\int_{\mathbb{R}^n} (\nabla f(x-z)-\nabla f(y-z))\varphi_\epsilon(z)\, dz,x-y\Big\rangle\Big|\\ &\leq \int_{\mathbb{R}^n} \big|\big\langle \nabla f(x-z)-\nabla f(y-z)),(x-z)-(y-z)\big\rangle\big|\, \varphi_\epsilon(z)\, dz \leq L|x-y|^2, \end{split} \end{equation} where the last inequality is a consequence of (1) and $\int_{\mathbb{R}^n}\varphi_\epsilon=1$. Since $f_\epsilon\in C^\infty$, the first part of the proof yields $$ |\nabla f_\epsilon(x)-\nabla f_\epsilon(y)|\leq L|x-y| $$ and the result follows upon passing to the limit as $\epsilon\to 0$.