Prove $\| A(A^TA)^{-1}A^T\|_2 = 1$ when rank of matrix $A$ is $n$

I will explain it from a geometric perspective.

The matrix $P=A(A^TA)^{-1}A^T$ is an orthogonal projection matrix satisfying $P^2=P$ and $P^T=P$. For any $x\in\mathbb{R}^m$, $Px$ is the orthogonal projection of $x$ onto the column space of $A$.

  • Consider an orthogonal basis $\{x_i\}_{i=1}^n$ of $\mathrm{Range}(A)$. Then $Px_i=x_i$.
  • Consider an orthogonal basis $\{y_i\}_{i=1}^{m-n}$ of the orthogonal complement of $\mathrm{Range}(A)$, then $Py_i=0$.

Therefore, $P$ has $n$ eigenvalues equal to 1 and $m-n$ eigenvalues equal to 0. Since $P$ is symmetric positive semi-definite, its singular values are equal to its eigenvalues. As a result, $\|P\|_2=\sigma_{\max}(P)=1$.


An argument without geometry goes like this:

As Shiyu said $P^2=P$ and hence, $\|P\| = \|P^2\|\leq \|P\|^2$ and therefor $1\leq \|P\|$. Moreover, $P^T=P$ and hence, $\|Px\|^2 = \langle Px,Px\rangle = \langle Px, x\rangle \leq \|Px\|\|x\|$ which gives $\|P\| \leq 1$.


Let $H = A(A^TA)^{-1}A^T$. To see that $x\mapsto Hx$ (for $x\in\mathbb{R}^m$) is the orthogonal projection onto the column space of $A$, it suffices to prove two things:

  • If $x$ is in the column space of $A$, then $Hx=x$.
  • If $x$ is orthogonal to all columns of $A$, then $Hx=0$.

To prove the second statement, notice that if $x$ is orthogonal to all columns of $A$, then $A^T x = 0$. Therefore $A(A^TA)^{-1}A^Tx = 0$.

To prove the first statement, notice that $x$ is in the column space of $A$ iff $x = Aw$, for some $w$. Therefore $$ Hx = HAw = \Big(A(A^TA)^{-1}A^T\Big) Aw = A(A^TA)^{-1}\Big(A^T A\Big)w = Aw = x. $$

Now let $x$ be any vector in $\mathbb{R}^m$. Decompose $x$ into a component in the column space of $A$ and a component orthogonal to the column space of $A$. The component in the column space of $A$ is $u=Hx$. The component orthogonal to the column space of $A$ is $v=(I-H)x$. What then is $\|Hx\|_2$? It is $\|u\|_2 \le \|u+v\|_2 = \|x\|_2$. Since $\|Hx\|_2 \le \|x\|_2$, we have $\|H\|_2 \le 1$. But since $\|Hu\|_2= \|u\|_2$, we have $\|H\|_2\ge1$.