What is contracting a tensor actually doing?

My favorite way to interpret the trace is as the average value of an associated quadratic form. Here's how that works.

Let $V$ be an $n$-dimensional vector space, and let $T$ be a tensor on $V$. First let's consider the case in which $T$ is a tensor of type $(1,1)$, which we can also interpret as a linear map from $V$ to itself. Choose an inner product $\left< \cdot,\cdot\right>$ on $V$, and define the associated quadratic form $Q\colon V\to\mathbb R$ by $$Q(x) = \left< x, Tx \right>.$$ Then a computation shows that the trace of $T$ is $n$ times the average value of $Q$ over the unit sphere in $V$.

(Here's a sketch of how this computation is done: Choose an orthonormal basis for $V$ and express $x$ in terms of that basis as an $n$-tuple $(x^1,\dots,x^n)$, with $(x^1)^2 + \dots + (x^n)^2 = 1$. Then $$\int_{\mathbb S^{n-1}} Q(x)\,dA = \sum_{i,j}T_i^j\int_{\mathbb S^{n-1}} x^ix^j\,dA. $$ The integrals on the right with $i\ne j$ are all zero, while the ones with $i=j$ are all the same, as can be seen by renaming the variables; adding them all up yields the volume of the sphere, so each integral is $1/n$th of the volume.)

It's interesting to note that, because the trace is independent of basis, this result doesn't depend on the inner product chosen, even though the quadratic form will change depending on the inner product.

The quadratic form may seem to capture only part of the information encoded in $T$. But note that once an inner product is chosen, there's a one-to-one correspondence between linear maps $T\colon V\to V$ and bilinear forms $B_T\colon V\times V\to\mathbb R$, given by $B_T(x,y) = \left<x,Ty\right>$. Each such bilinear form decomposes into a symmetric part and a skew-symmetric part: $B_T = B_T^{\text{sym}}+B_T^{\text{skew}}$. The trace of the skew part is zero, so the trace only "sees" the symmetric part; and the symmetric part can be reconstructed from the quadratic form by using the polarization identity $B_T(x,y) = \tfrac14(Q(x+y)-Q(x-y))$.

Now if $T$ is a tensor of type $(k,l)$, the contraction on any pair of indices yields a tensor of type $(k-1,l-1)$, whose value on any set of arguments $x_1,\dots,x_{k-1}, x_1^*,\dots,x_{l-1}^*$ is just $n$ times the average value of the quadratic form determined by the $(1,1)$-tensor $T(x_1,\dots,x_{k-1},\ \cdot\ , x_1^*,\dots,x_{l-1}^*,\ \cdot\ )$.


It might be worth adding some material and corrections here, since this post comes up in searches. Firstly, the previous answer should read "the trace of $T$ is $n$ times the average value of $Q$ over the unit sphere," not $1/n$; this can be immediately seen by considering the identity matrix.

Here’s an alternative, possibly more explicit way of seeing this. If the tensor is defined on a real vector space with a positive definite inner product, then we can choose an orthonormal basis and ignore the raising and lowering of indices. The unit vectors satisfy $v^{a}v^{a}=1$. We want to calculate the average of $T_{ab}v^{a}v^{b}$. The average of $v^{a}v^{b}$ when $a\neq b$, e.g. $v^{1}v^{2}$, vanishes, since for every vector with a given value of $v^{1}v^{2}$, there is another reflected across the $x^{1}=0$ hyperplane with the negative of this value, and therefore the average of e.g. $T_{12}v^{1}v^{2}$ vanishes as well. The average of $v^{a}v^{b}$ when $a=b$, e.g. $v^{1}v^{1}$, must be $1/n$, since the sum $v^{a}v^{a}=1$ for each vector. Therefore the average of $T_{11}v^{1}v^{1}$ is $T_{11}/n$, and the sum of such elements $T_{aa}/n$ is the average of $T_{ab}v^{a}v^{b}$. The contraction can thus be viewed as "$n$ times the average of the tensor applied to the unit vectors."