Taylor series of functions with matrix input

Let's first try to understand what happens in the case of polynomials: Take some polynomial function $f: \mathbb{R\to \mathbb{R}}$. We would like to define $f(X)$, where $X$ is a real $n\times n$ matrix for $n \in \mathbb{N}$. Since $f$ is a polynomial, we have $$f(x)=p_0x^0+p_1x+p_2x^2+...+p_mx^m$$ for some $m \in \mathbb{N}$ and $(p_k)_{k=0}^m \in \mathbb{R^m}$. Since $n \times n$ matrices, like real numbers, can be added and multiplied by itself, it is only natural to define $$f(X)=p_0X^0+p_1X^1+p_2X^2+...+p_mX^m,$$ where $X^0$ is the identity matrix. So for any polynomial function $f: \mathbb{R}\to \mathbb{R}$ we can define a corresponding matrix function $F: \mathbb{R^{n\times n}} \to \mathbb{R^{n \times n}}$.

But we want more: We want to define $f(X)$ for some larger class of functions where $f$ isn't necessarily a polynomial anymore. Let's consider power series, which in some sense are just "infinite polynomials". Take a function $f: \mathbb{R} \to \mathbb{R}$ which can be written as $$f(x)=\sum_{j=0}^\infty q_jx^j$$ for some $(q_j)_{j\in\mathbb{N}}\in \mathbb{R}^\mathbb{N}$, where the series of course has to converge for all $x\in \mathbb{R}$. Then analogous to the polynomial case we would like to define for an $n\times n$ matrix $X$ $$f(X):=\sum_{j=0}^\infty q_jX^j.$$ But we have to make sure that this series converges for all $X\in\mathbb{R^{n\times n}}$ as well, otherwise this expression doesn't make any sense! Now technically we need a norm on $\mathbb{R}^{n \times n}$ to talk about convergence, but the choice doesn't really matter because all norms on finite dimensional vector spaces are equivalent. Let's take the norm $||X||:=\sqrt{\sum_{i,j=0}^n |X_{i,j}|^2}$ because it has the convenient property of being submultiplicative, that is $||XY||\leq||X||||Y||$ for all $X,Y \in \mathbb{R^{n \times n}}$, and that will be useful in the following argument:

$$||\sum_{j=0}^{m+n}q_jX^j-\sum_{j=0}^{m}q_jX^j||=||\sum_{j=m+1}^{m+n}q_jX^j||\leq \sum_{j=m+1}^{m+n}|q_j|||X^j|| \leq \sum_{j=m+1}^{m+n}|q_j|||X||^j \rightarrow 0,$$ for $n\to \infty$ and any $m\in\mathbb{N}$ since the power series $\sum_{j=0}^\infty q_jx^j$ was assumed to have a radius of convergence of infinity, so it converges absolutely everywhere. Because we've shown that $(\sum_{j=0}^n q_jX^j)_{n \in \mathbb{N}}$ is a Cauchy sequence in the complete space $(\mathbb{R^{n \times n}}, ||\cdot||)$, the series converges everywhere (even absolutely).

So now we can define $f(X)$ for any power series function $f: \mathbb{R}\to\mathbb{R}$. In particular, if $f$ is an infinitely differentiable function for which it's Taylor series converges in $x_0\in \mathbb{R}$ to $f$, we have defined $f(X)$ (after shifting $x \mapsto (x-x_0)$), and it is given by $$f(X)=\sum_{j=0}^\infty \frac{f^{(j)}(x_0)}{j!}(X-x_0I)^j.$$ By setting $f=\exp$, you get the matrix exponential of which you talked about in your question.

Now finally if $X=\begin{bmatrix}x&1\\0&x\end{bmatrix}$, then after multiplying some matrices one gets for $x\neq x_0$ $$f(X)=\sum_{n=0}^\infty \frac{f^{(n)}(x_0)}{n!}\begin{bmatrix}(x-x_0)^n&\frac{n}{(x-x_0)^{n-1}}\\0&(x-x_0)^n\end{bmatrix}.$$ For $x=x_0$, it just reduces to the polynomial case $$f(X)=f(x)\begin{bmatrix}1&0\\0&1\end{bmatrix}+f'(x)\begin{bmatrix}0&1\\0&0\end{bmatrix}.$$ I think you have an error in your question because for any $x_0$ this doesn't line up with the expression you gave.


I believe that, as mentioned by mathreadler in his comment on the question itself, that there is indeed a typo and that the correct formula may be written

$f \left (\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix} \right ) = f(x)I + f'(x)N, \tag 0$

with $I$ and $N$ explained in what follows.

I assume $f(x)$ is represented by a Taylor series about $x = 0$:

$f(x) = f(0) + f'(0)x + \dfrac{1}{2}f''(0)x^2 + \ldots = \displaystyle \sum_0^\infty \dfrac{1}{n!} f^{(n)}(0) x^n; \tag 1$

we have

$\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix}^n = \left ( \begin{bmatrix} x & 0 \\ 0 & x \end{bmatrix} + \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \right )^n = \left ( x\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \right )^n; \tag 2$

setting

$I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \tag 3$

and

$N = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \; N^2 = 0, \tag 4$

we write

$\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix}^n = (xI + N)^n; \tag 5$

since $IN = NI$, (5) may be subject to the ordinary binomial expansion, and since $N^2 = 0$, the terms containing the powers of $N$ greater than the second vanish; thus

$\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix}^n = (xI + N)^n = x^nI + nx^{n - 1}N; \tag 6$

if we substitute this into (1) we obtain

$f \left (\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix} \right ) = f(0)I + f'(0)(xI + N) + \dfrac{1}{2}f''(0) (x^2 I + 2xN) + \ldots$ $= (f(0) + f'(0)x + \dfrac{1}{2} f''(0)x^2 + \ldots)I + (f'(0) + f''(0)x + \ldots)N$ $= \displaystyle \sum_0^\infty \dfrac{1}{n!} f^{(n)}(0)(xI + N)^n = \sum_0^\infty \dfrac{1}{n!}f^{(n)}(0)(x^nI + nx^{n - 1}N)$ $= \displaystyle \sum_0^\infty \dfrac{1}{n!} f^{(n)}(0)x^nI + \sum_1^\infty \dfrac{1}{n!} f^{(n)}(0)nx^{n - 1}N$ $= \left ( \displaystyle \sum_0^\infty \dfrac{1}{n!} f^{(n)}(0)x^n \right ) I + \left ( \displaystyle \sum_1^\infty \dfrac{1}{(n - 1)!} f^{(n)}(0)x^{n - 1} \right )N; \tag7$

we observe that the coefficient of $I$ is the Taylor series of $f(x)$ and that of $N$ is the Talylor series of $f'(x)$; thus

$f \left (\begin{bmatrix} x & 1 \\ 0 & x \end{bmatrix} \right ) = f(x)I + f'(x)N. \tag 8$

This "expansion" is in fact exact on any inteval containing $0$ on which the Taylor series for $f(x)$ and $f'(x)$ converge.