Product rule for matrix-valued functions and Differentiability of matrix multiplication

The guess was more or less in the right direction, but the meaning of $F'G+FG'$ needs to be stated precisely.

First, we should clarify some notation: for $M_0 \in \mathbb{R}^{n\times n}$, the expression $F'(M_0)[M]$ means the derivative of $F$ at $M_0$, applied to $M$, for $M\in \mathbb{R}^{n\times n}$. Letting $M_0$ be fixed, we'll also use the notation $\Delta F(M):=F(M_0+M)-F(M_0)$.

Now what we will prove is this:

$$(F\cdot\ G)'(M_0)[M]=F'(M_0)[M]G(M_0)+F(M_0)G'(M_0)[M] \text{, for all }M\in\mathbb{R}^{n\times n}$$

(notice that $M$ - and every other matrix for that matter - needs to be in the right places, since matrix multiplication is not commutative)

By definition, we need to show that:

$$\lim_{||M||\to 0}\frac{||\Delta (F\cdot\ G)(M)-(F'(M_0)[M]G(M_0)+F(M_0)G'(M_0)[M])||}{||M||}=0 \text{ (*)}$$

Let's make some rearrangements. Notice that:

\begin{align*} \Delta (F\cdot\ G)(M) &=F(M_0+M)G(M_0+M)-F(M_0)G(M_0) \\ &=(\Delta F(M)+F(M_0))(\Delta G(M)+G(M_0))-F(M_0)G(M_0)\\ &=\Delta F(M) \Delta G[M] + \Delta F(M) G(M_0)+F(M_0)\Delta G(M) \end{align*}

Pluging this in (*) and rearranging the terms, what we actually need to show is:

$$\frac{||\Delta F(M)\Delta G(M)+(\Delta F(M)-F'(M_0)[M])G(M_0)+F(M_0)(\Delta G(M)-G'(M_0)[M])||}{||M||}\to 0$$

as $||M||\to 0$. Using the triangle inequality and the fact that $||A\cdot\ B||\leq ||A||\cdot\ ||B||$ we have that the last expression is less or equal to the sum:

$$\frac{||\Delta F(M)\Delta G(M)||}{||M||}+ \frac{||\Delta F(M)-F'(M_0)[M]||}{||M||}||G(M_0)||+||F(M_0)||\frac{||\Delta G(M)-G'(M_0)[M]||}{||M||}$$

Notice that since $F, G$ are differentiable, the last two terms go to zero, by definition of derivative. So we're only left to prove that the first term $\frac{||\Delta F(M)\Delta G(M)||}{||M||}$ goes to zero.

Since $\Delta F(M) = F'(M_0)[M] + o(||M||)$ and $\Delta G(M) = G'(M_0)[M] + o(||M||)$, we get:

$$\Delta F(M_0)\Delta G(M_0)= F'(M_0)[M]G'(M_0)[M]+o(||M||)$$

as $||M||\to 0$. Now, since $M$, $F'(M_0)$ and $G'(M_0)$ are linear operators themselves, we have:

\begin{align*} \lim_{||M||\to 0}\frac{||\Delta F(M_0)\Delta G(M_0)||}{||M||} &= \lim_{||M||\to 0}\frac{||F'(M_0)[M]G'(M_0)[M]||}{||M||} \\ &\leq \lim_{||M||\to 0}{\frac{||F'(M_0)[M]||}{||M||}||G'(M_0)[M]||}\\ &\leq \lim_{||M||\to 0}\frac{||F'(M_0)||\cdot\ ||M||}{||M||} ||G'(M_0)[M]||\\ &= ||F'(M_0)||\lim_{||M||\to 0}||G'(M_0)[M]|| \end{align*}

But linear operators are continuous, so $||G'(M_0)[M]||\to 0$ as $||M||\to 0$ and we're done.


$E = \mathbb{R}^{n \times n}$ is a Banach space (which, with the given norm, is isomorphic to the space of continuous linear maps $\mathscr{L}(\mathbb{R}^n; \mathbb{R}^n)$).

The usual matrix multiplication $\mu \colon E \times E \to E$, $(B, C) \mapsto BC$ is bilinear; and it is continuous, because $\|BC\| \leqslant \|B\|\|C\|$ for all $B, C \in E$. Continuity follows from e.g. J. Dieudonne, Foundations of Modern Analysis (1969), proposition (5.5.1).

By e.g. Dieudonne's proposition (8.1.4), $\mu$ is differentiable at every point $(B, C)$ in $E \times E$, and its derivative at $(B, C)$ is the continuous linear mapping \begin{equation} \tag{1}\label{eq:mu} \mu'(B, C) \colon E \times E \to E, \ (U, V) \mapsto BV + UC. \end{equation}

By e.g. Dieudonne's proposition (8.1.5), the function $$ H \colon E \to E \times E, \ A \mapsto (F(A), G(A)) $$ is differentiable at every point $A \in E$, and - under the natural linear isometry between $\mathscr{L}(E; E \times E)$ and $\mathscr{L}(E; E) \times \mathscr{L}(E; E)$, where both products have the $\sup$ norm - the derivative at $A$ is identified with the continuous linear mapping $$ H'(A) = (F'(A), G'(A)). $$

(I had better now spell that out in more detail - especially as I haven't properly studied differential calculus in Banach spaces! It took some brass nerve to post this answer. Partly it's in order to learn some of this stuff myself, but it is also to show that an answer requires virtually no calculation.)

The natural linear isometry in question sends $L : E \to E \times E$ to $(\operatorname{pr}_1 \circ L, \operatorname{pr}_2 \circ L)$, where $\operatorname{pr}_i$ ($i = 1, 2$) are the projection mappings $E \times E \to E$. So we have \begin{equation} \tag{2}\label{eq:H} H'(A) \colon E \to E \times E, \ T \mapsto (F'(A)(T), G'(A)(T)). \end{equation}

By the Chain Rule (which is Dieudonne's proposition (8.1.2)), \begin{align*} (F \cdot G)'(A) & = (\mu \circ H)'(A) \\ & = \mu'(H(A)) \circ H'(A) \\ & = \mu'(F(A), G(A)) \circ H'(A). \end{align*} Therefore, by \eqref{eq:mu} and \eqref{eq:H}, \begin{align*} (F \cdot G)'(A)(T) & = \mu'(F(A), G(A))(H'(A)(T)) \\ & = \mu'(F(A), G(A))(F'(A)(T), G'(A)(T)) \\ & = \boxed{F(A)G'(A)(T) + F'(A)(T)G(A)} \end{align*} - which agrees with rmdmc89's answer.

By the same argument, the result holds generally for differentiable functions $F: W \to \mathbb{R}^{n \times p}$, $G: W \to \mathbb{R}^{m \times n}$, where $W$ is an open subset of a Banach space and $m, n, p$ are positive integers.