Intuitively, $why$ do defective matrices have extra eigenvalues?

Well, I would say there is only one eigenvalue: $1$. The point is, we usually say it's "repeated" or that it has "(algebraic) multiplicity $2$". Think about what it means to "repeat" an eigenvalue; under which circumstances do we list it twice, or more? And, when we do, how many times should we be listing it?

You seem to be counting based on the dimension of the eigenspace for the eigenvalue (or equivalently, the maximum number of linearly independent eigenvectors you can come up with). This is known as the geometric multiplicity. And, indeed, the geometric multiplicity of $1$ is $1$ in this case. Note how it does not agree with the exponent of the $\lambda - 1$ factor in the characteristic polynomial.

The algebraic multiplicity counts the dimension of the generalised eigenspace. The generalised eigenspace is given by $$\operatorname{ker}(M - \lambda I)^n$$ where $M$ is an $n \times n$ matrix and $\lambda$ is an eigenvalue. Note how this contains $\operatorname{ker} (M - \lambda I)$ (if $(M - \lambda I)$ sends a vector to $0$, then applying it $n - 1$ more times will still send it to $0$), which is the (usual) eigenspace corresponding to $\lambda$. When $M$ is diagonalisable, this is invariably equal to $\operatorname{ker}(M - \lambda I)$, but when $M$ is defective, this can be larger than the eigenspace.

Now, as it turns out, the generalised eigenspaces always sum to $\Bbb{C}^n$, and as a consequence, we can always form a basis of generalised eigenvectors. There's a particularly nice class of such bases called Jordan bases; these are the next best things we can find to bases of eigenvectors. Instead of diagonalising a matrix, they turn it into Jordan Normal Form, an excellent consolation prize when a diagonal representation is denied to us. Jordan normal forms exist for every matrix, unlike diagonal forms!

The algebraic multiplicities also correspond to the exponents of the corresponding factors in the characteristic polynomials. In fact, some define the characteristic polynomial by this characteristic, and its determinant representation becomes a theorem.


In the case of the $2 \times 2$ matrix presented, the eigenspace, $\operatorname{ker} (M - I)$ is simply $\operatorname{span}\{(1, 0)\}$. However, if we compute $$\operatorname{ker} \left(\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} - \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\right)^2 = \operatorname{ker} \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}^2 = \operatorname{ker} \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} = \Bbb{C}^2,$$ we see that the generalised eigenspace is $2$-dimensional, and the algebraic multiplicity is $2$.


I don't know if this qualifies as intuition, but in your case note that $$ \begin{pmatrix}1 & 1 \\ 0 & 1 \end{pmatrix}=\begin{pmatrix}1 & 0 \\ 0 & 1 \end{pmatrix}+\begin{pmatrix}0 & 1 \\ 0 & 0 \end{pmatrix} $$

One of these is a proper diagonal matrix, while the other is a shift-matrix. Now, it is a theorem that for any nil-potent matrix $A$, i.e. a matrix such that $A^n=0$ for some $n$, there exists some basis $\mathcal{B}=\{b_1,...,b_n\}$ such that the matrix in this basis has the form of a shift matrix (i.e. $A b_i= b_{i+1}$ or $A b_i=0$ for any $i$).

There is another theorem stating that any matrix is the sum of a nil-potent matrix and one that is diagonalisable (as long as your scalars belong to an algebraically closed field, such as $\mathbb{C}$). Hence, take your general $M=D+A,$ where $D$ is diagonalisable and $A$ is nil-potent. Then, the above observations allows us to find some basis such that $D$ is diagonal and another basis such that $A$ is a shift-matrix.

However, the magic of the Jordan Normal Form is that this can be done simultaneously. I.e., there exists a single basis with respect to which $D$ is diagonal and $A$ is a shift-matrix. Thus, the obstruction to the geometric multiplicity of every eigenvalue corresponding to their algebraic multiplicity is exactly this shift-matrix $A$. If $A=0,$ then $D$ is diagonalisable and, of course, vice versa.

In your case, you have something that is almost an eigenvector, namely $(0,1),$ but instead of just producing a scalar multiple of itself, it produces itself and an honest-to-god eigenvector, $(1,0)$. These are called generalised eigenvectors.

So to sum up: Geometrically speaking, you don't have as many actual eigenvectors as you would like because some shift is happening.