Matrix derivative $(Ax-b)^T(Ax-b)$

Perhaps some help to compute the partial derivatives would be appreciated. It is always best to be explicit when one is a bit confused with heavy notation. Write $A = (a_{ij})$, $x = (x_1,\dots,x_n)^{\top}$ and $b = (b_1,\dots, b_m)^{\top}$, assuming $A$ is an $m \times n$ matrix. Then the $i^{\text{th}}$ component of $Ax-b$ is $$ (Ax-b)_i = \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i $$ so that $$ (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m (Ax-b)_i^2 = \sum_{i=1}^m \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2. $$ Suppose you want to compute the derivative with respect to $x_k$, $1 \le k \le n$ (I choose $k$ because choosing $i$ or $j$ would be confusing with the preceding subscripts used). Then $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = \sum_{i=1}^m \frac{\partial}{\partial x_k} \left( \left( \sum_{j=1}^n a_{ij} x_j \right) - b_i \right)^2 = \sum_{i=1}^m 2 \left( \left( \sum_{j=1}^n a_{ij}x_j \right) - b_i \right) (a_{ik}). $$ In particular, we can let $A_k = (a_{1k},a_{2k},\dots,a_{mk})$ so that $$ \frac{\partial}{\partial x_k} (Ax-b)^{\top} (Ax-b) = 2 \langle A_k, Ax-b \rangle $$ where $\langle - , - \rangle$ denotes inner product. You can use convexity arguments to show that any critical point is a minimizer in this case ; although you can see that the minimizer will not always be unique, even when $m=n$ ; it suffices for $A$ to be singular for this to happen.

Hope that helps,


I assume you are trying to minimize $\langle A x -b, A x - b\rangle.$ This is always nonnegative, so if $A$ is nonsingular, then the minimum is $0.$ Otherwise, the $i$-th component of the gradient is

$$\langle A_i, Ax +b\rangle + \langle A x + b, A_i\rangle$$

Where $A_i$ is the $i$-th column of $A.$ What does this tell you?


The derivation becomes a lot simpler if we take the derivative with respect to the entire $x$ in one go:

$$\frac{\delta}{\delta x}(Ax-b)^T(Ax-b) \ \ = \ \ 2(Ax-b)^T\frac{\delta}{\delta x}(Ax-b) \ \ = \ \ 2(Ax-b)^TA$$

This follows from the chain rule:

$$\frac{\delta}{\delta x}uv \ \ = \ \ \frac{\delta u}{\delta x}v+u\frac{\delta v}{\delta x}$$

And that we can swap the order of the dot product:

$$\frac{\delta}{\delta x}u^Tu \ \ = \ \ \frac{\delta u^T}{\delta x}u+u^T\frac{\delta u}{\delta x} \ \ = \ \ (\frac{\delta u}{\delta x})^Tu+u^T\frac{\delta u}{\delta x} \ \ = \ \ 2u^T\frac{\delta u}{x}$$

I just learned matrix calculus from the Matrix Cookbook yesterday, and I love it :)