What does QR decomposition have to do with least squares method?

In the $QR$-decomposition, $Q$ is an orthogonal matrix. One property of these matrices is that they don't change the length of vectors (in the 2 norm). Thus, we have that $$\Vert Ax - b \Vert = \Vert QRx - b \Vert = \Vert Rx - Q^{-1}b \Vert.$$

In this way, we can reduce the problem of least squares to the case where we have an upper triangular matrix $R$.


$\sf{QR}$ decomposition is particularly important in least squares estimation of a nonlinear model $\boldsymbol y=f(\boldsymbol x_n,\boldsymbol\beta)+\boldsymbol\epsilon$, where analytical techniques cannot be used. One method to tackle this is the Gauss-Newton method, which briefly goes as follows:

  • Guess the parameter estimates $\boldsymbol\beta^0$ and approximate $f(\boldsymbol x_n,\boldsymbol\beta)$ as a first-order Taylor series about $\boldsymbol\beta^0$ $$f(\boldsymbol x_n,\boldsymbol\beta)\approx f(\boldsymbol x_n,\boldsymbol \beta^0)+\nu_{n1}(\beta_1-\beta_1^0)+\cdots+\nu_{nP}(\beta_P-\beta_P^0)$$ where $\nu_{np}=\frac{\partial f(\boldsymbol x_n,\boldsymbol\beta)}{\partial\beta_p}\bigg|{}_{\boldsymbol\beta_0}$ with $p=1,\cdots,P$.

  • Let $\boldsymbol\epsilon=\boldsymbol y-\tau(\boldsymbol\beta)$ where $\tau(\boldsymbol\beta)$ is the $N\times1$ vector with its $n$th element being $f(\boldsymbol x_n,\boldsymbol\beta)$ for $n=1,\cdots,N$. Then $\tau(\boldsymbol\beta)\approx\tau(\boldsymbol\beta^0)+\boldsymbol V^0(\boldsymbol\beta-\boldsymbol\beta^0)$ where $\boldsymbol V^0$ is the design matrix with dimensions $N\times P$ and elements $\nu_{np}$.

  • Thus we have $\boldsymbol\epsilon\approx\boldsymbol\epsilon^0-\boldsymbol V^0\boldsymbol\delta$ where $\boldsymbol\epsilon^0=\boldsymbol y-\tau(\boldsymbol\beta^0)$ and $\boldsymbol\delta=\boldsymbol\beta-\boldsymbol\beta^0$, and we want to minimise $\epsilon$. This can be done using $\sf{QR}$ decomposition as shown below:

  • Perform a $\sf{QR}$ decomposition of $\boldsymbol V^0=\boldsymbol Q\boldsymbol R=\boldsymbol Q_1 \boldsymbol R_1$ where $\boldsymbol R_1^{-1}$ is upper triangular. Then the Gauss increment is given by $\boldsymbol\delta^0=\boldsymbol Q_1^T\boldsymbol\epsilon^0\boldsymbol R_1^{-1}$.

  • Find the value of $\tau(\boldsymbol\beta^1)=\tau(\boldsymbol\beta^0+\boldsymbol\delta^0)$. This should be closer to $\boldsymbol y$ than $\tau(\boldsymbol\beta^0)$, and repeat until convergence is reached.

As you can see, the method of $\sf{QR}$ decomposition is crucial to the minimisation of the error term in a nonlinear model.