Matrix calculus in multiple linear regression OLS estimate derivation

Consider the full matrix case of the regression $$\eqalign{ Y &= XB+E \cr E &= Y-XB \cr }$$ In this case the function to be minimized is $$\eqalign{f &= \|E\|^2_F = E:E}$$ where colon represents the Frobenius Inner Product.

Now find the differential and gradient $$\eqalign{ df &= 2\,E:dE \cr &= -2\,E:X\,dB \cr &= 2\,(XB-Y):X\,dB \cr &= 2\,X^T(XB-Y):dB \cr\cr \frac{\partial f}{\partial B} &= 2\,X^T(XB-Y) \cr }$$ Set the gradient to zero and solve $$\eqalign{ X^TXB &= X^TY \cr B &= (X^TX)^{-1}X^TY \cr }$$ This result remains valid when $B$ is an $(N\times 1)$ matrix, i.e. a vector.

The problem is that, in the vector case, people tend to write the function in terms of the transpose product instead of the inner product, and then fall into rabbit holes concerning the details of the transpositions.

Matrix calculus in multiple linear regression OLS estimate derivation

Tags:

Linear Algebra

Least Squares

Derivatives

Matrices

Matrix Calculus

Related

Recent Posts