How to differentiate a matrix equation w.r.t a vector?

We use the linearity of differentiation and consider at first \begin{align*} g(M)=MAM^T\tag{1} \end{align*} with $M=(M_i)_{1\leq i\leq N}$ an $(1\times N)$-matrix.

We obtain \begin{align*} dg(M)&=dMAM^T+MAdM^T\tag{2}\\ \mathrm{vec}(dg(M))&=\mathrm{vec}(dMAM^T)+\mathrm{vec}(MAdM^T)\tag{3}\\ &=\left(MA^T\otimes I_1\right)\mathrm{vec}(dM)+\left(I_1\otimes MA\right)\mathrm{vec}\left(dM^T\right)\tag{4}\\ &=MA^T\mathrm{vec}(dM)+MA I_n\mathrm{vec}(dM)\tag{5}\\ &=\left(MA^T+MA\right)\mathrm{vec}(dM)\\ \color{blue}{\frac{\partial g(M)}{dM}}&=\frac{\partial \mathrm{vec}(dg(M))}{\mathrm{vec}(dM)}=\color{blue}{M\left(A^T+A\right)}\tag{6} \end{align*}

Comment:

  • In (2) we start by calculating the differential.

  • In (3) we vectorize the equation.

  • In (4) we use the relationship with Kronecker products to factor out $\mathrm{vec}(dM)$ resp. $\mathrm{vec}(dM^T)$.

  • In (5) we do a simplification and use $\mathrm{vec}(dM^T)=C\mathrm{vec}(dM)$ by noting the commutation matrix $C=I_n$.

  • In (6) we take the gradient.

We can check the result (6) by setting

\begin{align*} g(M)&=MAM^T\\ &=\left(M_i\right)_{1\leq i\leq N}\left(A_{ij}\right)_{1\leq i,j\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\left(\sum_{j=1}^N M_jA_{ij}\right)_{1\leq i\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij} \end{align*}

We obtain

\begin{align*} \color{blue}{\frac{\partial g(M)}{\partial M}}&=\frac{\partial}{\partial\left(M_1,\ldots,M_N\right)}\left(\sum_{i=1}^N\sum_{j=1}^NM_iM_jA_{ij}\right)\\ &=\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij}\right)_{1\leq k\leq N}\\ &=\left(\sum_{{j=1}\atop{j\ne k}}^N M_jA_{kj}+\sum_{{i=1}\atop{i\ne k}}^NM_iA_{ik}+2M_kA_{kk}\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=\left(\sum_{j=1}^NM_j\left(A_{kj}+A_{jk}\right)\right)_{1\leq k\leq N}} \end{align*}

in accordance with (6).

Finally considering $f$ we obtain using (6) \begin{align*} \frac{\partial f(M)}{\partial M}&=M\left(A^T+A\right)-2\frac{\partial}{\partial (M_1,\ldots,M_N)}\sum_{i=1}^N\log(M_i)\\ &=M\left(A^T+A\right)-2\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\log(M_i)\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=M\left(A^T+A\right)-2\left(\frac{1}{M_k}\right)_{1\leq k\leq N}} \end{align*}


You can think of $f$ as a function $f:\mathbb R^n\to\mathbb R$. Then if $u\in\mathbb R^n$,$$\frac{\partial f}{\partial u}=\nabla f\cdot u=\begin{pmatrix}\frac{\partial f}{\partial x_1}&\cdots&\frac{\partial f}{\partial x_n}\end{pmatrix}\begin{pmatrix}u_1\\\vdots\\u_n\end{pmatrix}=\sum_{k}\frac{\partial f}{\partial x_k}u_k.$$In particular, define $f:\mathbb R^{1\times n}\to\mathbb R$ by$$M\mapsto MAM^T-2\sum_k\log(M_k),$$where $A\in\mathbb R^{n\times n}$. Then, if $N\in\mathbb R^{1\times n}$,$$\frac{\partial f}{\partial N}(M)=(M(A+A^T)-2M^{\circ-1})\cdot N.$$

Note that if $A$ is symmetric,$$\frac{\partial f}{\partial N}(M)=2(MA-M^{\circ-1})\cdot N.$$