Differentiability of the operator norm

Since this is a general answer, it is important to give the definition anyway.

Given the map $ \phi : \mathcal L(\mathbb R^n , \mathbb R^n)$ to $\mathbb R$. We call it differentiable at the point $x$, if there exists a linear functional $\psi : \mathcal{L}(\mathbb R^n , \mathbb R^n) \to \mathbb R$ and a neighbourhood $x \in U$ (in the operator norm) such that for all $x' \in U$: $$ \phi(x') = \phi(x) + \psi(x' - x) + \epsilon(||x'-x||_{op}) $$

where $\epsilon : \mathbb R \to \mathbb R$ is a function satisfying $\lim_{h \to 0} \frac{\epsilon(h)}{h} = 0$.


The map $\phi(A) = \max_{|v| = 1} |Av| = \|A\|_{op}$ , on the given normed vector spaces , is Lipschitz. The reason for this is obviously the triangle inequality $$ \bigg|\|A\|_{op} - \|B\|_{op}\bigg| \leq \|A-B\|_{op} \implies |\phi(A) - \phi(B)| \leq \|A-B\|_{op} $$

It is well known by Rademacher's theorem that any such map is differentiable almost everywhere. So you do get differentiability at a lot of points. But no, not everywhere.


Of course not, because let $I$ be the identity linear transformation on $\mathbb R^n$. We claim $\phi$ is not differentiable at $I$. Think of $I$ as a diagonal matrix of ones.

Suppose $\phi$ is differentiable at $I$, say with the neighbourhood $U$ around $I$ given by differentiability, then consider the map (for small enough $\epsilon$ )$\xi : (-\epsilon , \epsilon) \to \mathcal L(\mathbb R^n,\mathbb R^n)$ defined as follows : let $C$ be a linear transformation sending $w \to w$ (for $0 \neq w$) and everything else to zero, and now define $\xi(t) = I + tC$ with $\epsilon$ small enough that $\xi(t) \in U$ for all $t \in (-\epsilon, \epsilon)$.

Think of $C$ as a matrix of zeros except it has a $1$ at one point in the diagonal. Then $tC$ is that same matrix with $1$ replaced by $t$.

It is easy to see that $\xi$ is differentiable, I leave you to see that. Now, by the chain rule, $\phi \circ \xi : (-\epsilon , \epsilon) \to \mathbb R$ is differentiable, in the usual sense.

But what is this map? It is nothing but $||I+tC||_{op}$.

Now we understand the operators $I+tC$. For $t$ negative, this is a matrix with ones on the diagonal , except $1+t < 1$ at one position. The maximum norm of this matrix is $1$, easily enough.

Now, for $t$ positive, we'd have a $1+t > 1$ in that position, which gives the maximum norm as $1+t$.

Finally, the description : $$ \phi \circ \xi(t) = \begin{cases} 1 & t \leq 0 \\ 1+t & t > 0 \end{cases} $$

is clearly that of a non-differentiable function. It follows that $\phi$ is not differentiable at $I$.

You can probably generalize this in some way to other points.


Consider $V=W=\mathbb R^n$ with $\|\cdot\|$ as the spectral norm on matrices (induced by the 2-norm on vectors). If $\|\cdot\|$ is differentiable, then for any $A,B$, the one-sided directional derivatives $$\nabla_B\|A\|=\lim_{h\to0^+}\frac{\|A+hB\|-\|A\|}{h}$$ should satisfy $\nabla_B\|A\|=-\nabla_{-B}\|A\|$. However, take $A=I$ and $B=\operatorname{diag}(1,0,\dots,0)$, and we have $\nabla_B\|A\|=1$ but $\nabla_{-B}\|A\|=0$.