Differentiate $f(x)=x^TAx$

There is another way to solve the problem:

Let $\mathbf{x}^{n\times 1}=(x_1,\dots ,x_n)'$ be a vector, the derivative of $\mathbf y=f(\mathbf x)$ with respect to the vector $\mathbf{x}$ is defined by $$\frac{\partial f}{\partial \mathbf x}=\begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots\\ \frac{\partial f}{\partial x_n} \end{pmatrix}$$ Let \begin{align} \mathbf y&=f(\mathbf x)\\&=\mathbf x'A\mathbf x \\&=\sum_{i=1}^n\sum_{j=1}^n a_{ij}x_ix_j\\&=\sum_{i=1}^na_{i1}x_ix_1+\sum_{j=1}^na_{1j}x_1x_j+\sum_{i=2}^n\sum_{j=2}^n a_{ij}x_ix_j \\\frac{\partial f}{\partial x_1} &=\sum_{i=1}^na_{i1}x_i+\sum_{j=1}^na_{1j}x_j\\&=\sum_{i=1}^na_{1i}x_i+\sum_{i=1}^na_{1i}x_i \,[\text{since}\,\, a_{1i}=a_{1i}]\\ &=2 \sum_{i=1}^na_{1i}x_i \\ \frac{\partial f}{\partial \mathbf x}&=\begin{pmatrix} 2 \sum_{i=1}^na_{1i}x_i \\ \vdots\\ 2 \sum_{i=1}^na_{ni}x_i \end{pmatrix} \\&=2\begin{pmatrix} a_{11} & a_{12} & \dots & a_{1n}\\ \vdots & \vdots &\ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{pmatrix}\begin{pmatrix}x_1 \\ \vdots \\ x_n \end{pmatrix}\\ &= 2A\mathbf x \end{align}


As a start, things work "as usual": You calculate the difference between $f(x+h)$ and $f(x)$ and check how it depends on $h$, looking for a dominant linear part as $h\to 0$. Here, $f(x+h)=(x+h)^TA(x+h)=x^TAx+ h^TAx+x^TAh+h^TAh=f(x)+2x^TAh+h^TAh$, so $f(x+h)-f(x)=2x^TA\cdot h + h^TAh$. The first summand is linear in $h$ with a factor $2x^TA$, the second summand is quadratic in $h$, i.e. goes to $0$ faster than the first / is negligible against the first for small $h$. So the row vector $2x^TA$ is our derivative (or transposed: $2Ax$ is the derivative with respect to $x^T$).


@Hagen von Eitzen's answer is certainly the fastest route here, but since you asked, here is a chain rule.

Here are two useful facts about linear and bilinear bounded maps from normed vectors spaces to normed vector spaces.

If $f$ is linear and bounded, then trivially: $$ df_x(h)=f(h). $$

And if $g$ is bilinear and bounded ($\|g(h,k)\|\leq C\|h\|\|k\|$), we have $$ dg_{(x,y)}(h,k)=g(x,k)+g(h,y). $$

Now take $f(x)=(x,x)$ and $g(x,y)=x^tAy$. The former is linear and bounded, the latter is bilinear and bounded.

So, by the chain rule, $g\circ f(x)=x^tAx$ is differentiable and $$ d(g\circ f)_x(h)=dg_{f(x)}\circ df_x(h)=dg_{(x,x)} (h,h)=x^tAh+h^tAx. $$

This is true for any matrix $A$. Now if $A$ is symmetric, this can be simplified since $$ x^tAh+h^tAx=x^tAh+h^tA^tx=x^tAh+(Ah)^tx=2x^tAh. $$

Removing $h$, this gives $$ d(g\circ f)_x=2x^tA. $$