Fast(est) and intuitive ways to look at matrix multiplication?

Perhaps the best way to look at matrix multiplication when you want to compute a product by hand is as follows:

matrix multiplication

(See also the comparable illustration on the wikipedia page.)

When you're computing by hand, just bump the second matrix up to make room for the product in the lower-right corner.

Note that the upper-left corner must be a square, which re-confirms the requirement that "columns of $A$" = "rows of $B$"; moreover, you can see that $A\times B$ inherits its row dimension from $A$ and its column dimension from $B$.


Matrix multiplication is defined as it is so that is reflects the composition of linear maps. No more, no less.


There is already a really great answer on why matrix multiplication is defined as it is, so this shall be the only mention of it in this answer. Instead I will show you, how I normally handle these multiplications, and why this particluar way of doing them is better suited for hand- or head-calculations then others. $\newcommand{\vek}[1]{\boldsymbol{#1}}$

Before I really start, I will swiftly go over some notation, and before you really get your answer, I will explain how I do matrix-vector multiplications, since this will be used in for the matrix-matrix multiplication.

Notation

Let $A \in K^{q \times n}$ and $B \in K^{n \times p}$ be two matrices and $\boldsymbol{\nu} \in K^n$ a vector, where $K$ is any field that tickles your fancy. I will write the vectors in bold, to differentiate them more clearly form numbers in a matrix.

Now for $A$ we can write $$ A = \begin{pmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{q1} & a_{q2} & \dots & a_{qn} \\ \end{pmatrix} = \begin{pmatrix} & & & \\ \boldsymbol{a}_{1} & \boldsymbol{a}_{2} & \dots & \boldsymbol{a}_{n} \\ & & & \\ \end{pmatrix} $$ where we defined the column-vectors $\boldsymbol{a}_{j}~$ . To be precise, we have $$ \boldsymbol{a}_{j} = \begin{pmatrix} a_{1j} \\ \vdots \\a_{qj} \end{pmatrix} $$ And we do the same thing in $B$ to get column vectors $\boldsymbol{b}_{j}$


Matrix-Vector mutliplication

Let's say you want to evaluate $A \boldsymbol{\nu} = \boldsymbol{\omega}$ then you will normally find something like $$ \sum_{j=1}^n a_{ij} \nu_j = \omega_i ~~~~~~\text{for every} ~~~~i \in \{1,\dots,q\} \tag{1} $$ as one possible way to get $\vek{\omega}$. So practically you calculate every component of $ \boldsymbol{\omega}$ seperately. Instead of this I always do it, and I suggest you do as well, in the following way

$$ A \boldsymbol{\nu} = \begin{pmatrix} & & & \\ \boldsymbol{a}_{1} & \boldsymbol{a}_{2} & \dots & \boldsymbol{a}_{n} \\ & & & \\ \end{pmatrix} \begin{pmatrix} \nu_{1} \\ \vdots \\ \nu_{n} \end{pmatrix} = \nu_1 \boldsymbol{a}_{1} + \nu_2 \boldsymbol{a}_{2} + \dots + \nu_n \boldsymbol{a}_{n} = \boldsymbol{\omega} \tag{2} $$ Here, you multiply every column vector $\vek{a}_j$ of $A$ by $\nu_j$, and add those resulting vectors up.

So what is the difference?

$(1)$ If you do the calculation component-wise, then you calculate $\omega_1$ by taking $a_{11}$ and multiplying it by $\nu_1$ to which you add $a_{12}$ multiplied by $\nu_2$ ... Then you calculate $\omega_2$ by taking $\nu_1 a_{21}$, adding $\nu_2 a_{22}$ ... Until you have calculated $\omega_q$, at which point you are done. So in every step (i.e. for every $\omega_i$), you have to go back to your vector $\boldsymbol{\nu}$, check again what $\nu_1$ was (because you have propably forgotten it when you got to $\nu_3$), look at what $a_{i1}$ is, and do the multiplication.

$(2)$ Using the column vectors of $A$, you check the value of $\nu_1$ once, multiply every component of $\boldsymbol{a}_1$ by it, and write down the resulting vector. Now you never have to think about $\nu_1$ ever again, you just continue with $\nu_2$, $\nu_3$ etc. And when all the $\nu_j \boldsymbol{a}_j$ lay written down before you, you may just squash (add) them together to get the desired result.

So by using $(2)$ you avoid a lot of going back to look at what $\nu_j $ was.

Matrix-Matrix multiplication

Now we get to the bit you asked for. If I have to calculate $AB$, I use the following fact

$$ AB = A \begin{pmatrix} & & & \\ \boldsymbol{b}_{1} & \boldsymbol{b}_{2} & \dots & \boldsymbol{b}_{p} \\ & & & \\ \end{pmatrix} = \begin{pmatrix} & & & \\ A \boldsymbol{b}_{1} & A \boldsymbol{b}_{2} & \dots & A \boldsymbol{b}_{p} \\ & & & \\ \end{pmatrix} \tag{3} $$ using $(2)$ to calculate the $A \boldsymbol{b}_{j}$'s.

Let me convince you that the above is actually true, so that you can use it without concern.

We want to know what the $j$-th column vector of $AB$ is. Now if we take $\boldsymbol{e}_j$ to be the vector which has 0 everywhere, except for the $j$-th component, which is 1, we have, by $(2)$, that $(AB) \boldsymbol{e}_j$ is the column vector we are looking for. But this also equates to $$ (AB) \boldsymbol{e}_j = A(B \boldsymbol{e}_j) = A \boldsymbol{b}_j $$ Again, the advantage of calculating the product like this is that you have to remember some values less often. This time you will have to check some values several times. But it's still worse if you use the $(AB)_{ij} = \sum_k a_{ik} b_{kj}$ -Way to do the multiplication. There are some other cases, where handling matrix multiplication in this way can be useful.


Usages

  • Let $A$ be a diagonalizable matrix with Eigenvectors $s_j$ who have Eigenvalues $\sigma_j$, $D$ its diagonal matrix, and $S$ the transformation matrix. What I could always remember was that $S$ contains the Eigenvectors of $A$, and that either $ S^{-1} A S = D$ or $ S A S^{-1} = D$, but I could never just remember which way. Using $(3)$ to carry out the multiplications you get $$ S^{-1} A S = S^{-1} A \begin{pmatrix} \\ \boldsymbol{s}_{1} \dots \\ \\ \end{pmatrix} = S^{-1} \begin{pmatrix} \\ A \boldsymbol{s}_{1} \dots \\ \\ \end{pmatrix} = S^{-1} \begin{pmatrix} \\ \sigma_1 \boldsymbol{s}_{1} \dots \\ \\ \end{pmatrix} = \begin{pmatrix} \\ \sigma_1 S^{-1} \boldsymbol{s}_{1} \dots \\ \\ \end{pmatrix} = \begin{pmatrix} \\ \sigma_1 \boldsymbol{e}_{1} \dots \\ \\ \end{pmatrix} = D $$
  • If you want to evaluate a vector-matrix product, i.e. something like $\vek{\nu}^T A = \vek{\omega}$, where $\vek{\nu}$ is a column vector, then there is already everything you need to discuss this case as well, since $$ \vek{\omega}^T = (\vek{\nu}^T A)^T = A^T \vek{\nu} $$ where you can use $(2)$ , and to get $\vek{\omega}$ you just do the transpose again. If you carry out the calculation, you will see that this leads to the general result $$ \vek{\nu}^T A = \begin{pmatrix} \nu_1 & \dots & \nu_q \end{pmatrix} \begin{pmatrix} & \vek{a}_1 & \\ & \vdots & \\ & \vek{a}_q & \end{pmatrix} = \nu_1 \vek{a}_1 + \dots + \nu_q \vek{a}_q = \vek{\omega} $$ Where the $\vek{a}_j$'s are now the row vectors of $A$.

  • In some cases, when the matrix A doesn't look too scary, you can just look at the matrix to find it's inverse (if invertible of course). Just ask yourself: what does the $j$-th column vector of the inverse has to accomplish? It gives you a linear combination of the column vectors of $A$, which have to add up to $\vek{e}_j$ . So if you can see what linear combination does the job, you get one column of $A^{-1}$ .
    Here an easy example: $$ A = \begin{pmatrix} 0 & -1 \\ 1 & 2 \end{pmatrix} $$ To get $\vek{e}_1$ you take $ 2 \cdot \begin{pmatrix} 0 \\ 1 \end{pmatrix} -1 \cdot \begin{pmatrix} -1 \\ 2 \end{pmatrix} ~~$ and $~~\vek{e_2} = 1 \cdot \begin{pmatrix} 0 \\ 1 \end{pmatrix} + 0 \cdot \begin{pmatrix} -1 \\ 2 \end{pmatrix} $
    So the inverse is $$ A^{-1} = \begin{pmatrix} 2 & 1 \\ -1 & 0 \end{pmatrix} $$