Is there any geometrical intuition for the factorials in Taylor expansions?

Yes. There is a geometric explanation. For simplicity, let me take $x=0$ and $h=1$. By the Fundamental Theorem of Calculus (FTC), $$ f(1)=f(0)+\int_{0}^{1}dt_1\ f'(t_1)\ . $$ Now use the FTC for the $f'(t_1)$ inside the integral, which gives $$ f'(t_1)=f'(0)+\int_{0}^{t_1}dt_2\ f''(t_2)\ , $$ and insert this in the previous equation. We then get $$ f(1)=f(0)+f'(0)+\int_{0}^{1}dt_1\int_{0}^{t_1}dt_2 f''(t_2)\ . $$ Keep iterating this, using the FTC to rewrite the last integrand, each time invoking a new variable $t_k$. At the end of the day, one obtains $$ f(1)=\sum_{k=0}^{n}\int_{\Delta_k} dt_1\cdots dt_k\ f^{(k)}(0)\ +\ {\rm remainder} $$ where $\Delta_k$ is the simplex $$ \{(t_1,\ldots,t_k)\in\mathbb{R}^k\ |\ 1>t_1>\cdots>t_k>0\}\ . $$ For example $\Delta_{2}$ is a triangle in the plane, and $\Delta_3$ is a tetrahedron in 3D, etc. The $\frac{1}{k!}$ is just the volume of $\Delta_k$. Indeed, by a simple change of variables (renaming), the volume is the same for all $k!$ simplices of the form $$ \{(t_1,\ldots,t_k)\in\mathbb{R}^k\ |\ 1>t_{\sigma(1)}>\cdots>t_{\sigma(k)}>0\} $$ where $\sigma$ is a permutation of $\{1,2,\ldots,k\}$. Putting all these simplices together essentially reproduces the cube $[0,1]^k$ which of course has volume $1$.

Exercise: Recover the usual formula for the integral remainder using the above method.

Remark 1: As Sangchul said in the comment, the method is related to the notion of ordered exponential. In a basic course on ODEs, one usually sees the notion of fundamental solution $\Phi(t)$ of a linear system of differential equations $X'(t)=A(t)X(t)$. One can rewrite the equation for $\Phi(t)$ in integral form and do the same iteration as in the above method with the result $$ \Phi(s)=\sum_{k=0}^{\infty}\int_{s\Delta_k} dt_1\cdots dt_k\ A(t_1)\cdots A(t_k)\ . $$ It is only when the matrices $A(t)$ for different times commute, that one can use the above permutation and cube reconstruction, in order to write the above series as an exponential. This happens in one dimension and also when $A(t)$ is time independent, i.e., for the two textbook examples where one has explicit formulas.

Remark 2: The method I used for the Taylor expansion is related to how Newton approached the question using divided differences. The relation between Newton's iterated divided differences and the iterated integrals I used is provide by the Hermite-Genocchi formula.

Remark 3: These iterated integrals are also useful in proving some combinatorial identities, see this MO answer:

https://mathoverflow.net/questions/74102/rational-function-identity/74280#74280

They were also used by K.T. Chen in topology, and they also feature in the theory of rough paths developed by Terry Lyons.


The best I can do, as far as (hopefully) nice figures, is the following.

The simplex $\Delta_1$

enter image description here

has one-dimensional volume, i.e., length $=1=\frac{1}{1!}$.

The simplex $\Delta_2$

enter image description here

has two-dimensional volume, i.e., area $=\frac{1}{2}=\frac{1}{2!}$.

The simplex $\Delta_3$

enter image description here

has three-dimensional volume, i.e., just volume $=\frac{1}{6}=\frac{1}{3!}$.

For obvious reasons, I will stop here.


Here is a heuristic argument which I believe naturally explains why we expect the factor $\frac{1}{k!}$.

Assume that $f$ is a "nice" function. Then by linear approximation,

$$ f(x+h) \approx f(x) + f'(x)h. \tag{1} $$

Formally, if we write $D = \frac{\mathrm{d}}{\mathrm{d}x}$, then the above may be recast as $f(x+h) \approx (1 + hD)f(x)$. Now applying this twice, we also have

\begin{align*} f(x+2h) &\approx f(x+h) + f'(x+h) h \\ &\approx \left( f(x) + f'(x)h \right) + \left( f'(x) + f''(x)h \right)h \\ &= f(x) + 2f'(x)h + f''(x)h^2. \tag{2} \end{align*}

If we drop the $f''(x)h^2$ term, which amounts to replacing $f'(x+h)$ by $f'(x)$, $\text{(2)}$ reduces to $\text{(1)}$ with $h$ replaced by $2h$. So $\text{(2)}$ may be regarded as a better approximation to $f(x+2h)$, with the extra term $f''(x)h^2$ accounting for the effect of the curvature of the graph. We also note that, using $D$, we may formally express $\text{(2)}$ as $f(x+2h) \approx (1+hD)^2 f(x)$.

Continuing in this way, we would get

$$ f(x+nh) \approx (1+hD)^n f(x) = \sum_{k=0}^{n} \binom{n}{k} f^{(k)}(x) h^k. \tag{3} $$

So by replacing $h$ by $h/n$,

$$ f(x+h) \approx \left(1 + \frac{hD}{n}\right)^n f(x) = \sum_{k=0}^{n} \frac{1}{n^k} \binom{n}{k} f^{(k)}(x) h^k. \tag{4} $$

Now, since $f$ is "nice", we may hope that the error between both sides of $\text{(4)}$ will vanish as $n\to\infty$. In such case, either by using $\lim_{n\to\infty} \frac{1}{n^k}\binom{n}{k} = \frac{1}{k!}$ or $\lim_{n\to\infty} \left(1 + \frac{a}{n}\right)^n = e^a = \sum_{k=0}^{\infty} \frac{a^k}{k!} $,

$$ f(x+h) = e^{hD} f(x) = \sum_{k=0}^{\infty} \frac{1}{k!} f^{(k)}(x) h^k . \tag{5} $$


Although this above heuristic leading to $\text{(5)}$ relies on massive hand-waving, the formal relation in $\text{(5)}$ is justified in the context of functional analysis and tells that $D$ is the infinitesimal generator of the translation semigroup.


The polynomials

$$p_k(h):=\frac{h^k}{k!}$$

have two remarkable properties:

  • they are derivatives of each other, $p_{k+1}'(h)=p_k(h)$,

  • their $n^{th}$ derivative at $h=0$ is $\delta_{kn}$ (i.e. $1$ iff $n=k$, $0$ otherwise).

For this reason, they form a natural basis to express a function in terms of the derivatives at a given point: if you form a linear combination with coefficients $c_k$, evaluating the linear combination at $h=0$ as well as the derivatives of this linear combination at $h=0$, you will retrieve exactly the coefficients $c_k$. The denominators $k!$ ensure sufficient damping of the fast growing functions $h^k$ for the unit condition to hold and act as normalization factors.


$$\begin{pmatrix}f(x)\\f'(x)\\f''(x)\\f'''(x)\\f''''(x)\\\cdots\end{pmatrix}= \begin{pmatrix}1&h&\frac{h^2}2&\frac{h^3}{3!}&\frac{h^4}{4!}&\cdots \\0&1&h&\frac{h^2}2&\frac{h^3}{3!}&\cdots \\0&0&1&h&\frac{h^2}2&\cdots \\0&0&0&1&h&\cdots \\0&0&0&0&1&\cdots \\&&&\cdots \end{pmatrix} \begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}$$

$$\begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}= \begin{pmatrix}1&0&0&0&0&\cdots \\0&1&0&0&0&\cdots \\0&0&1&0&0&\cdots \\0&0&0&1&0&\cdots \\0&0&0&0&1&\cdots \\&&&\cdots \end{pmatrix} \begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}$$