Transpose of product of matrices

Here's an alternative argument. The main importance of the transpose (and this in fact defines it) is the formula $$Ax\cdot y = x\cdot A^\top y.$$ (If $A$ is $m\times n$, then $x\in \Bbb R^n$, $y\in\Bbb R^m$, the left dot product is in $\Bbb R^m$ and the right dot product is in $\Bbb R^n$.)

Now note that $$(AB)x\cdot y = A(Bx)\cdot y = Bx\cdot A^\top y = x\cdot B^\top(A^\top y) = x\cdot (B^\top A^\top)y.$$ Thus, $(AB)^\top = B^\top A^\top$.


When you multiply $A$ and $B$, you are taking the dot product of each ROW of $A$ and each COLUMN of $B$.

The resulting dimension is $A_{\#col}\times B_{\#row}$, and after transposing, you have $B_{\#row}\times A_{\#col}$.

When you multiply $B^T$ and $A^T$, you take the dot product of each row of $B^T$ (column of B) and column of $A^T$, or row of $A$.

Your resulting dimension is $B^T_{\#col}\times A^T_{\#row}$ which is just $B_{\#row}\times A_{\#col}$

This formula ensures that each entry is correct, and that the dimensions are identical.


If you know about dual spaces and maps, a conceptual proof can be obtained by observing that $A^T$ corresponds to the dual map of $A$ and that taking the dual is contravariant with respect to composition. That is, $(T \circ S)^* = S^* \circ T^*$.