Intuition for the Product of Vector and Matrices: $x^TAx $

To add to the earlier response, and interpreting your question to ask for heuristics for how to manipulate matrices, and not specifically for what matrix multiplication means.

I assume here we interpret vectors as columns vectors, so $x^T$ would refer to a row vector, and capitals for matrices. When $A=(a_{ij})$ then $A^T=(a_{ji})$ so transposition, that is, interchange of rows and columns, corresponds to switching the indices! Remembering that, you can easily convert a symbolic matrix product to a sum over indexed expressions, manipulate, and reconvert to a symbolic matrix product.

One useful trick is pre- and post-multiplication by diagonal matrices: premultiplication corresponds to operations on the rows, while post-multiplication corresponds to operations on the columns. That is, letting $D$ be a diagonal matrix, in $DA$ each row of $A$ is multiplied by the corresponding diagonal element of $D$, while in $AD$ each column of $A$ is multiplied with the corresponding diagonal element.

Now an example to show how to use this manipulative tricks. Suppose $X$ is an $n\times n$ matrix such that there exists an basis for $\mathbb R^n$ consisting of eigenvectors of $X$ (we assume all elements are real here). That is, the eigenvalue/eigenvector equation $Xx=\lambda x$ has $n$ linearly independent solutions, call them (or some choice of them if they are not unique) $x_1, \dots, x_n$. with corresponding eigenvalues $\lambda_i$, the elements of the diagonal matrix $\Lambda$. Write $$ X x_i = \lambda_i x_i $$ Now let $P$ be a matrix with the $x_i$ as columns. How can we write the equations above as one matrix equation? Note that the constants $\lambda_i$ are multiplying columns, we know that in the matrix representation the diagonal matrix $\Lambda$ must postmultiply $P$. That is, we get $$ X P = P \Lambda $$ Premultiplying on both sides with the inverse of $P$, we get $$ P^{-1} X P = \Lambda $$ That is, we can se that $X$ is similar to the diagonal matrix consisting of its eigenvalues.

One more example: If $S$ is a sample covariance matrix, how can we convert it to a sample correlation matrix? The correlation between variable $i$ and $j$ is the covariance divided into the standard deviations of variable $i$ and of variable $j$: $$ \text{cor}(X_i,X_J) = \frac{\text{cov}(X_i, X_j)} {\sqrt{\text{var}(X_i) \text{var}(X_j) }} $$

Looking at this with matrix eyes, we are dividing the $(i,j)$-element of the matrix $S$ with the square roots of the $i$th and $j$th diagonal elements! We are dividing each row of $S$ and each column of $S$ with the same diagonal elements, so it can be expressed as pre- and post-multiplication by the (same) diagonal matrix, that holding the square roots of the diagonal elements of $S$. We have found: $$ R = D^{-1/2} S D^{-1/2} $$ where $R$ is the sample correlation matrix, and $D$ is a diagonal matrix holding the diagonal elements of $S$.

There are a lots of applications of this kind of tricks, and I find it so useful that textbooks should include them. One other example: Now let P be a permutation matrix, that is an $n\times n$ matrix representing a permutation on $n$ symbols. Such a matrix has one 1 and $n-1$ zeros in each row and each column, and can be obtained by permuting (in the same way!) the rows and columns of an identity matrix. Now $AP$ (since it is a post-multiplication) permutes the columns of $A$, while $PA$ permutes the rows of $A$.


Forgetting the coordinates, or even that these are matrices at all, can be a very efficient way of thinking in some circumstances.

$x^tAy$ can mean any bilinear mapping $\beta: (x,y) \mapsto x^tAy$, something like a 'generalized scalar multiplication'. If you have fixed a basis $e_1,\ldots,e_n$, then $\beta(e_i,e_j) = e_i^tA e_j =$ (the matrix element of $A$ at the i_th row and j_th column), and this information uniquely determines all the generalized scalar products.

Anyway, besides this, more frequently, $n\times m$ matrices are identified with the $\mathbb R^m\to\mathbb R^n$ linear mappings, those are, roughly speaking, origo and line preserving geometrical transformations. [The correspondence is coming again by applying any fixed basis, usually the standard one]. And the matrix multiplication corresponds to the composition of these transformations.