Converting between matrix multiplication and tensor contraction

Slogan: Matrices are a tool to compute sums; tensors tell you which sums make sense.

When you convert between rank-2 tensors and matrices, the decision as to which index of the tensor labels the rows and which one labels the columns is purely conventional. Matrix multiplication is no more than a convenient way to write products of the form

$$K(i,k) = \sum\nolimits_j M(i,j)N(j,k),$$

where I have conciously refrained from using indices to label matrix elements; instead, the first argument labels rows and the second labels columns.

Imagine, for example, that you want to compute the contraction $A^{ij} B_{jk}$. Define the matrix $M$ by $M(i,j) = A^{ij}$, the matrix $N$ by $N(j,k) = B_{jk}$ and the matrix $K$ by $K(i,k) = A^{ij}B_{jk}$. Then match the definitions of tensor contraction* and matrix multiplication to see that $$K = MN.$$ Do the multiplication and read off the components of the contraction using the definition of $K$.

Now define $\tilde M(j,i) = A^{ij}$, $\tilde N(j,k) = B_{ij}$, $\tilde K(k,i) = A^{ij}B_{jk}$. The components of the contraction you want and the tensors you know are still all contained in $\tilde K$, $\tilde M$ and $\tilde N$, but differently. Matching up the definitions of multiplication and contraction (and transpose) once again now gives $$\tilde K^T = \tilde M^T\tilde N\quad\text{or, equivalently,}\quad\tilde K = \tilde N^T\tilde M.$$

Look at all this again. You computed the same tensor contraction, twice, using different products of different matrices. (Of course, in fact $\tilde M = M^T$, $\tilde N = N$ and $\tilde K = K^T$.) In reality you would conserve letters, so my $M$ would also be called $A$, but this is nothing more than an abuse of notation. Doing the sums is all matrix products are good for. The meaning of the sums has to be obtained from elsewhere, usually from the tensor contractions they represent. A “matrix” is not a geometric object (and thus can’t be a physical object either).

On the other hand, a tensor is a geometric object. I will not try to describe what tensors really are (unless you ask for it in the comments), but as an appetizer, try to contemplate the difference between

  • vectors $u, v\in V$ and scalar-valued linear functions $\phi, \psi : V\to\mathbf R$, given that $\phi(u)$ and $\psi(v)$ make sense but $u(v)$ and $\phi(\psi)$ don’t; now remember that things like $\phi_iu^i$ are well-defined but those like $u^iv^i$ are not;
  • linear maps $A, B : V\to V$ such as rotations and bilinear functions $f, g : V\times V\to\mathbf R$, given that $A(u)$ is a vector, $f(u,v)$ is a scalar, $A\circ B$ is a linear map, but $A(u,v)$ and $f\circ g$ are meaningless; now remember you can contract $A^i_ju^j$, $f_{ij}u^iv^j$ and $A^i_jB^j_k$ (and look at the arrangement of free indices) but neither $A^i_ju^iv^j$ nor $f_{ij}g_{jk}$.

* I expect that your tensors and their contractions are defined in the physicicts’ tradition as collections of coordinates with incomprehensible properties involving large sums. This definition is decidedly not the most easily understood one; try searching or asking on MathOverflow for the one using universal properties.


As @AccidentalFourierTransform said, it's a mistake to try and think of tensors as matrices, as you rapidly end up going mad when you need to compute $A^{\alpha\beta\gamma\delta}B_{\beta\delta}$ or something. That said $A^{\alpha\beta}B_{\alpha\gamma}$ 'looks like' $A^TB$.

The right way to think of tensors is as multilinear functions, because that is what they are, in fact. In the index notation each index then corresponds to a slot in the function. So a $(2,2)$ tensor $T$ is a function $T(\_,\_;\_,\_)$ where the first two arguments want one-forms and the second two vectors. This then tells you how to write the indices of the components: not above each other, but in the order of the arguments, so $T^{\alpha\beta}{}_{\gamma\delta}$, and not $T^{\alpha\beta}_{\gamma\delta}$, which is ambiguous as to argument order.

It is really important to remember that tensors are multilinear functions, because if you don't do that you will inevitably fall into the terrible pit of index-gymnastics and become impaled on the poisoned spikes of contravariance and covariance, from which only a few ever escape. Remember there is geometry, not just indices and transformation rules.

One way of avoiding this trap while still being able to work in a convenient notation is to use the Penrose abstract-index notation. The most convenient thing of all is that you don't have to change anything to use it, though you might want to change the symbols you pick indices from.