Product of a vector and its transpose (Projections)

You appear to be conflating the dot product $a\cdot b$ of two column vectors with the matrix product $a^Tb$, which computes the same value. The dot product is symmetric, but matrix multiplication is in general not commutative. Indeed, unless $A$ and $B$ are both square matrices of the same size, $AB$ and $BA$ don’t even have the same shape.

In the derivation that you cite, the vectors $a$ and $b$ are being treated as $n\times1$ matrices, so $a^T$ is a $1\times n$ matrix. By the rules of matrix multiplication, $a^Ta$ and $a^Tb$ result in a $1\times1$ matrix, which is equivalent to a scalar, while $aa^T$ produces an $n\times n$ matrix: $$ a^Tb = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}b_1\\b_2\\ \vdots\\b_n\end{bmatrix} = \begin{bmatrix}a_1b_1+a_2b_2+\cdots+a_n b_n\end{bmatrix} \\ a^Ta = \begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix}\begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix} = \begin{bmatrix}a_1^2+a_2^2+\cdots+a_n^2\end{bmatrix} $$ so $a^Tb$ is equivalent to $a\cdot b$, while $$aa^T = \begin{bmatrix}a_1\\a_2\\ \vdots\\a_n\end{bmatrix}\begin{bmatrix}a_1&a_2&\cdots&a_n\end{bmatrix} = \begin{bmatrix}a_1^2&a_1a_2&\cdots&a_1a_n \\ a_2a_1&a_2^2&\cdots&a_2a_n \\ \vdots&\vdots&\ddots&\vdots \\ a_na_1&a_na_2&\cdots&a_n^2\end{bmatrix}.$$ Note in particular that $b\cdot a=b^Ta$, not $ba^T$, as the latter is also an $n\times n$ matrix.

The derivation of the projection might be easier to understand if you write it slightly differently. Start with dot products: $$p={a\cdot b\over a\cdot a}a={1\over a\cdot a}a(a\cdot b)$$ then replace the dot products with equivalent matrix products: $$p={1\over a^Ta}a(a^Tb).$$ This expression is a product of the scalar ${1\over a^Ta}$ with three matrices. Since matrix multiplication is associative, we can regroup this as $${1\over a^Ta}(aa^T)b.$$ This is a scalar times an $n\times n$ matrix times an $n\times1$ matrix, i.e., a vector.

Addendum: The scalar factor can be absorbed into the $n\times n$ matrix $aa^T$; the resulting matrix $\pi_a$ represents orthogonal projection onto (the span of) $a$. That it is a projection is easy to verify: $$\pi_a^2 = \left({aa^T\over a^Ta}\right)^2 = {(aa^T)(aa^T)\over (a^Ta)(a^Ta)} = {a(a^Ta)a^T\over(a^Ta)(a^Ta)} = {(a^Ta)(aa^T)\over(a^Ta)(a^Ta)} = {aa^T\over a^Ta} = \pi_a,$$ again using associativity of matrix multiplication and the fact that $a^Ta$ is a scalar so commutes with matrices. In addition, $$\pi_aa = {aa^T\over a^Ta}a = {a^Ta\over a^Ta}a = a,$$ as expected.

In the above derivation of projection onto $a$, $b$ was an arbitrary vector, so for all $b$, $\pi_ab$ is some scalar multiple of $a$. In other words, the image (column space) of $\pi_a$ is spanned by $a$—it’s the line through $a$—and so the rank of $\pi_a$ is one. This can also be seen by examining $aa^T$ directly: each column is a multiple of $a$.

As a final note, the above derivation requires that the vectors and matrices be expressed relative to a basis that’s orthonormal with respect to the dot product. It’s possible to remove this restriction, but the expression for the projection matrix will be more complex.


If $a = \begin{bmatrix} x\\y\\z \end{bmatrix}$ and $b = \begin{bmatrix} u\\v\\w \end{bmatrix}$, then

$a^Tb = [xu + yv + zw] = [xu+yv+zw] \;\;$ and $\langle a,b \rangle = xu + yv + zw$

The difference between the one-by-one vector $[xu+yv+zw]$ and the scalar $xu+yv+zw$ is so trivial that it is ignored.

Note also that $ab^T = \begin{bmatrix} xu & xv & xw\\ yu & yv & yw\\ zu & zv & zw \\ \end{bmatrix}$.