Why does this "miracle method" for matrix inversion work?

The real answer is the set of $n\times n$ matrices forms a Banach algebra - that is, a Banach space with a multiplication that distributes the right way. In the reals, the multiplication is the same as scaling, so the distinction doesn't matter and we don't think about it. But with matrices, scaling and multiplying matrices is different. The point is that there is no miracle. Rather, the argument you gave only uses tools from Banach algebras (notably, you didn't use commutativity). So it generalizes nicely.

This kind of trick is used all the time to great effect. One classic example is proving that when $\|A\|<1$ there is an inverse of $1-A$. One takes the argument about geometric series from real analysis, checks that everything works in a Banach algebra, and then you're done.


Think about how you derive the finite version of the geometric series formula for scalars. You write:

$$x \sum_{n=0}^N x^n = \sum_{n=1}^{N+1} x^n = \sum_{n=0}^N x^n + x^{N+1} - 1.$$

This can be written as $xS=S+x^{N+1}-1$. So you move the $S$ over, and you get $(x-1)S=x^{N+1}-1$. Thus $S=(x-1)^{-1}(x^{N+1}-1)$.

There is only one point in this calculation where you needed to be careful about commutativity of multiplication, and that is in the step where you multiply both sides by $(x-1)^{-1}$. In the above I was careful to write this on the left, because $xS$ originally multiplied $x$ and $S$ with $x$ on the left. Thus, provided we do this one multiplication step on the left, everything we did works when $x$ is a member of any ring with identity such that $x-1$ has a multiplicative inverse.

As a result, if $A-I$ is invertible, then

$$\sum_{n=0}^N A^n = (A-I)^{-1}(A^{N+1}-I).$$

Moreover, if $\| A \| < 1$ (in any operator norm), then the $A^{N+1}$ term decays as $N \to \infty$. As a result, the partial sums are Cauchy, and so if the ring in question is also complete with respect to this norm, you obtain

$$\sum_{n=0}^\infty A^n = (I-A)^{-1}.$$

In particular, in this situation we recover the converse: if $\| A \| < 1$ then $I-A$ is invertible.


The matrices commute. The rest is "functional calculus" (also called operator calculus) applied to A.

Think for example of how the calculation would look in a simultaneous eigenbasis for A and B. When the matrices commute there is a basis in which both are diagonal (or both in Jordan normal form). Then your operations are valid if they are valid when applied to each eigenvalue considered as a number.