Why does polynomial factorization generalize to matrices

This follows simply from the universal property of polynomial rings, which implies that any polynomial equation in $\, R[x]\,$ will persist when "evaluated" into any ring where the images of the constants commute with the image of $x$ (which is precisely the condition necessary for such a map to be a ring homomorphism).

Indeed, a polynomial ring is designed precisely to have this property, i.e. it is the most general ("free") ring that contains $\,R\,$ and a new element $\,x\,$ that commutes with all elements of $R$. Because we use only the ring axioms and constant commutativity when proving polynomial equations, such proofs persist in said ring images where constant commutativity persists.

This is true in your example because constants $\,r\,$ map to a constant matrices $\,rI\,$ which commute with $\,T = $ image of $x$.

This implies that all familiar polynomial equations (e.g. Binomial Theorem and difference of squares factorization) persist to be true when evaluated into any ring where the constants commute with the indeterminates. Ditto for many other ubiquitous polynomial equations, e.g. cyclotomic polynomial factorizations, polynomial Bezout identities for the gcd, resultants, etc. Therefore such equations represent universal laws (identities), modulo said constant commutativity.

These ideas are brought to the fore when one studies $(R-)$algebras, which are rings containing a central image of $R$, i.e. where the images of elements of $R$ commute with everything. Any polynomial equation that holds true in $\,R[x_1,\ldots,x_n]\,$ will persist to be true when evaluated into any $R$-algebra, i.e. it is an identity (law) of $R$ algebras. In fact it is easy to show that an equation holds true in $\,R[x_1,\ldots,x_n]\,$ iff it is true in all all $R$ algebras. Hence the equations that hold true $\,R[x_1,\ldots,x_n]\,$ are precisely the identities (universal laws) of $R$-algebras.


What matters is that the matrices involved, namely powers of $T$, commute with each other. With that in mind, the legitimacy of the factorisation should be clear: just think about expanding the brackets using the associative and distributive properties of matrix multiplication. A more sophisticated argument can be obtained by viewing the equation in terms of an action of the polynomial ring $\mathbb{C}[x]$, in which factorisation is more familiar.


An explanation is that the polynomials in a given matrix $A$ constitute a commutative subring $\mathbb{K}[A]$ of the ring of matrices, and can be considered as an image of the ring of polynomials $\mathbb{K}[X]$ , thus you can consider that in this subring, you work exactly as in $\mathbb{K}[X]$.

In fact, $\mathbb{K}[A]$ is isomorphic to the quotient ring $\mathbb{K}[X]/(m(X))$ where $m(X)$ is a minimal polynomial for matrix $A$.

Edit : let us take an example (the matrix has been borrowed from (https://www.youtube.com/watch?v=FecegfvA-Pg)).

Consider matrix $A=\begin{pmatrix}0&-2&-2\\1&3&1\\0&0&2\end{pmatrix}$ whose characteristic polynomial is $$c(X)=X^3+X^2+1$$

Using Cayley-Hamilton theorem, one has $$c(A)=A^3+A^2+I=0$$ (think to replace 1 by $I$!), otherwise said, each time you meet $A^3$ in a computation, you can replace it by $-A^2-I$. These leads to a systematic degree lowering: any polynomial of any degree in $A$ can be brought to a (unique) form as an (at most) 2nd degree polynomial, for example $A^5+A=A^3A^2+A=(-A^2-I)A+A=-A^3=-A^2-I$. But in fact, in this case there is lower degree combination of powers of $A$ that is annihilated, more precisely $$m(A)=A^2-3A+2I=0 \ \ \ (1)$$ ($m(X)=X^2-3X+2$ is called the/a minimal polynomial). Thus in fact, the degree lowering makes that any polynomial $P(A)$ in $A$ can be written a first degree polynomial $\pi(A)$. This transformation can be considered as a linear mapping, i.e., an homomorphism between linear spaces (note that I have changed $A$ into $X$).

$$\varphi: \ \ \begin{cases}P(X) \longrightarrow \pi(X)\\ \mathbb{K}[X] \longrightarrow \mathbb{K}_1[X]\end{cases}$$

(denoting by $\mathbb{K}_1[X]$ the vector space of polynomials of degree at most 1)

Note that the kernel of this mapping is the set $M$ of polynomials multiples of $m(x)$. You may know that taking classes modulo the kernel leads to an isomorphism (but may be not yet). The definition of $M$ sounds like a principal ideal. This is not astonishing because linear mapping $\varphi$ can also - fruitfully - be considered as a ring mapping (homomorphism between rings) with the following multiplication rule (because of relationship (1)): $$(aX+b)(a'X+b')=aa'(3X-2I)+...=(3aa'+ab'+a'b)X+(bb'-2aa')$$

The quotient space, like for vector spaces, would yield an isomorphism.

Remark: the unifying structure for vector spaces that are also rings with a certain compatibility relationship betwen the rules is that of an algebra structure.