What's umbral calculus about?

Umbral relations shadow those of the basic binomial transform, revealing underlying connections between diverse areas of math (as Leibnitz himself predicted--see H. Davis "Theory of Linear Operators"):

I) Umbral notation is brief and suggestive (courtesy of Blissard and contemporaries):

$ \displaystyle (a.)^n= a_n \;  \;  \; $ (umbral variable and lowering of superscript).

Expressing binomial convolution simply:

$$  \displaystyle (a. + b.)^{n} = \sum_{k=0}^{n}  \binom{n}{k} a_{k} b_{n-k} \; \;$$ (be careful to evaluate $(a.+b.)^0=a_0b_0$ and $(a.+b.)^1=a_0b_1+a_1b_0$), $$ \displaystyle e^{a. \; x}= \sum_{n \ge 0} a_n \frac{x^n}{n!} \; \; ,$$

$$ \displaystyle e^{a.\;x}\; e^{b.\; x} = e^{(a. + b.)x}\; \; .$$

A more precise notation is to use $\langle a.^n \rangle = a_n$ to clearly specify when the lowering op, or evaluation of an umbral quantity, is to be done. E.g.,

$$\langle a.^n a.^m\rangle=\langle a.^{n+m}\rangle= a_{n+m} \ne a_n a_m= \langle a.^n\rangle\langle a.^m\rangle$$

and $$\langle\exp[\ln(1+a.x)]\rangle=\langle(1+a.x)\rangle=1+a_1x$$

$$\ne \exp[\langle\ln(1+a.x)\rangle]=\exp \left[ \sum_{n \ge 1} \langle\frac{a.^nx^n}{n}\rangle\right]=\exp\left[\sum_{n \ge 1} \frac{a_nx^n}{n}\right]\; .$$

II) Same for umbralized ops, allowing succint specification and derivation of many relations, especially among special functions. A good deal of umbral calculus is about defining these ops for special sequences, such as the falling $(x)_{n}=x!/(x-n)!$ and rising factorials $(x)_{\bar{n}}=(x+n-1)!/(x-1)!$ and Bell polynomials $\phi_n(x)$.

Examples:

$ (:AB:)^n = A^n B ^n$  (defn. for order preserving exponentiation for any operators )

$$ (xD)^n = (\phi.(:xD:))^n = \phi_n(:xD:) \; \; ,$$

$$ e^{txD} = e^{t \phi.(:xD:)} \; \; .$$

From $xD \; x^{n} = n \; x^{n}$, it's easy to derive

$$e^{t\phi.(x)} = e^{x (e^t-1)} \; \; .$$

(See this MO-Q for the o.g.f.)

III) Umbral compositional inverse pairs allow for easy derivations of combinatorial identities and reveal associations among different reps of operator calculi.

Look at how this connects the distributive operator exponentiation $:xD:^n=x^nD^n$ to umbral lowering of superscripts. The falling factorials and Bell polynomials are an umbral inverse pair, i.e., $\phi_n((x).)=x^n=(\phi.(x))_n$. This is reflected in the functions $\log(1+t)$ and $e^t-1$, defining their e.g.f.s $e^{x\log(1+t)}$ and $e^{x(e^t-1)}$, being regular compositional inverses and to the lower triangular matrices containing the coefficients for the polynomials (the Stirling numbers of the first and second kinds) being multiplicative inverses, so we can move among many reps to find and relate many formulas. For the derivative op rep,

$$((xD).)^n=(xD)_n=x^nD^n=:xD:^n=(\phi.(:xD:)).^n=(\phi.(:xD:))_n,$$

so we have a connection to the umbral lowering of indices

$$:xD:^n=((xD).)^n=(xD)_n=x^nD^n.$$

IV) The generalized Taylor series or shift operator is at the heart of umbral calculus:

$$ e^{p.(x)D_y}f(y) = f(p.(x)+y)  \; , $$

(e.g., this entry on A class of differential operators and another on the Bernoulli polynomials) with special cases

$$ e^{:p.(x) D_x:} f(x) = f(p.(x) + x) \; , $$ and

$$ e^{-(1-q.(x))D_y}y^{s-1} \; |_{y=1} = (1-(1-q.(x)))^{s-1} \; ,$$ giving a Gauss-Newton interpolation of $q_n(x)$ (shadows of the binomial relations).

It can often be used to easily reveal interesting combinatorial relations among operators. A simple example:

$$ e^{txD} f(x) = e^{t\phi.(:xD:)} f(x) = e^{(e^t-1):xD:} f(x) = f(e^{t}x) \; .$$ You could even umbralize $t$ to obtain the Faa di Bruno formula. Try discovering some op relations with the Laguerre polynomials (hint--look at $:Dx:^n= D^nx^n$. Cf. Diff ops and confluent hypergeometric fcts.).

As another example (added May 2015) of the interplay between differential operators, umbral calculus, and finite differences, note the relations for the Bell polynomials

$$\phi_{n}(:xD_x:)= \sum_{k=0}^n S(n,k)x^kD_x^k = (xD_x)^n=\sum_{j=0}^\infty j^n \frac{x^jD^j_{x=0}}{j!}=\sum_{j=0}^\infty (-1)^j \left[\sum_{k=0}^j(-1)^k \binom{j}{k}k^n\right] \frac{x^jD_x^j}{j!} \;$$

and apply these operators on $x^m$, $e^{x}$, and $x^s$. (The S(n,k) are the Stirling numbers of the second kind.)

I've used the power monomials $x^n$ and their associated raising and lowering ops, $x$ and $D_x$, but these relations are shadowed by the raising and lowering ops of all umbral sequences $p_n(x)$ such that $R \; p_n(x) = p_{n+1}(x)$ and $L \; p_n(x)  = n \;  p_{n-1}(x)$. (Shadows of Lie and quantum mechanics here also.)

V) (Added Sept. 2020): The generalized Chu-Vandermonde identitiy for the discrete convolution of binomial coefficients--integral to understanding properties of confluent hypergeomeric functions and their diff op reps--is easily derived from the umbral Sheffer calculus.

Binomial Sheffer sequence of polynomials (BSP) have e.g.f.s of the form

$$e^{x \; h(t)} = e^{t \; B.(x)},$$

where $h(t)$ and is invertible and vanishes at the origin. This implies

$$e^{(x+y)h(t)} = e^{t \; B.(x+y)} = e^{xh(t)}e^{yh(t)} = e^{t \; B.(x)}e^{t B.(y)} = e^{t(B.(x)+B.(y))},$$ so follows the accumulation property $$(B.(x)+B.(y))^n = B_n(x+y).$$ The Stirling polynomials of the first kind, $ST1_n(x) = (x)_n$, are a BSP with $h(t)=\ln(1+t)$ and

$$\binom{x}{k} = \frac{ST1_k(x)}{k!},$$ so $$(ST1.(x)+ST1.(y))^n = ST1_n(x+y)$$

implies directly the Chu-Vandermonde identity

$$\binom{x+y}{n} = \sum_{k=0}^n \binom{x}{k} \binom{y}{n-k}.$$


From the Wikipedia article: "The combinatorialist John Riordan in his book Combinatorial Identities published in the 1960s, used techniques of this sort extensively." There you see the classical umbral calculus. Basically he pretends subscripts are exponents, and somehow it works. Take a look at that book. It doesn't require a lot of apparatus.

The 1978 paper by Roman and Rota, cited in the article, is the beginning of a technique for making the classical umbral calculus rigorous.

The 1975 paper by Rota, Kahaner, and Odlyzko appears to be a paper about Sheffer sequences, which are certain sequences of polynomials (see the Wikipedia article titled "Sheffer sequence"). If you lay that paper and the 1978 paper side-by-side, you can see that they're really two different ways of looking at the same thing.

In the mean time, look at the concrete examples in the Wikipedia article that you cited.

Is it useful? I think one could argue about that. But I don't want to try to make the case for its utility in research.