Intuition for the definition of the Gamma function?

I haven't quite got this straight yet, but I think one way to go is to think about choosing points at random from the positive reals. This answer is going to be rather longer than it really needs to be, because I'm thinking about this in a few (closely related) ways, which probably aren't all necessary, and you can decide to reject the uninteresting parts and keep anything of value. Very roughly, the idea is that if you "randomly" choose points from the positive reals and arrange them in increasing order, then the probability that the $(n+1)^\text{th}$ point is in a small interval $(t,t+dt)$ is a product of probabilities of independent events, $n$ factors of $t$ for choosing $n$ points in the interval $[0,t]$, one factor of $e^{-t}$ as all the other points are in $[t,\infty)$, one factor of $dt$ for choosing the point in $(t,t+dt)$, and a denominator of $n!$ coming from the reordering. At least, as an exercise in making a simple problem much harder, here it goes...

I'll start with a bit of theory before trying to describe intuitively why the probability density $\dfrac{t^n}{n!}e^{-t}$ pops out.

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\dfrac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

We can also look at Poisson point processes (aka, Poisson random measures, but that Wikipedia page is very poor). This is just makes rigorous the idea of randomly choosing unordered sets of points from a sigma-finite measure space $(E,\mathcal{E},\mu)$. Technically, it can be defined as a set of nonnegative integer-valued random variables $\{N(A)\colon A\in\mathcal{E}\}$ counting the number of points chosen from each subset A, such that $N(A)$ has the Poisson distribution of rate $\mu(A)$ and $N(A_1),N(A_2),\ldots$ are independent for pairwise disjoint sets $A_1,A_2,\ldots$. By definition, this satisfies $$ \begin{array}{}\mathbb{P}(N(A)=n)=\dfrac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array} $$ The points $T_1,T_2,\ldots$ above defining the homogeneous Poisson process also define a Poisson random measure with respect to the Lebesgue measure $(\mathbb{R}\_+,{\cal B},\lambda)$. Once you forget about the order in which they were defined and just regard them as a random set that is, which I think is the source of the $n!$. If you think about the probability of $T_{n+1}$ being in a small interval $(t,t+\delta t)$ then this is just the same as having $N([0,t])=n$ and $N((t,t+\delta t))=1$, which has probability $\dfrac{t^n}{n!}e^{-t}\delta t$.

So, how can we choose points at random so that each small set $\delta A$ has probability $\mu(\delta A)$ of containing a point, and why does $(1)$ pop out? I'm imagining a hopeless darts player randomly throwing darts about and, purely by luck, hitting the board with some of them. Consider throwing a very large number $N\gg1$ of darts, independently, so that each one only has probability $\mu(A)/N$ of hitting the set, and is distributed according to the probability distribution $\mu/\mu(A)$. This is consistent, at least, if you think about the probability of hitting a subset $B\subseteq A$. The probability of missing with all of them is $(1-\mu(A)/N)^N=e^{-\mu(A)}$. This is a multiplicative function due to independence of the number hitting disjoint sets. To get the probability of one dart hitting the set, multiply by $\mu(A)$ (one factor of $\mu(A)/N$ for each individual dart, multiplied by $N$ because there are $N$ of them). For $n$ darts, we multiply by $\mu(A)$ $n$ times, for picking $n$ darts to hit, then divide by $n!$ because we have over-counted the subsets of size $n$ by this factor (due to counting all $n!$ ways of ordering them). This gives $(1)$. I think this argument can probably be cleaned up a bit.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\dfrac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\dfrac{t^n}{n!}e^{-t}$ maximized at $t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\dfrac{t^n}{n!}e^{-t}\approx\dfrac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by $\log$, not $\exp$) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.


The geometric approach works.

Let’s compute the volume of the $2n$ dimensional ball, $D^{2n}$, in two ways. One way is extremely clever but has been known for centuries and provides interesting insights: it’s based on Liouville’s trick. Specifically, we will compute two integrals in polar coordinates, one of which is the volume of the ball and the other of which reduces to a product of one-dimensional integrals. Both integrands will depend (at most) on the radial coordinate $r$, which lets us separate out the surface area of the boundary of the ball as a common factor. Write this surface area as $S_{2n-1}$.

There’s essentially just one way to do this trick: integrate $\exp(-r^2)$. Its integral over $\mathbb{R}^{2n}$ equals

$$S_{2n-1} \int_0^\infty {\exp\left(- r^2 \right) r^{2n-1} dr}.$$

However, because $r^2 = x_1^2 + x_2^2 + \ldots + x_{2n}^2$, the integrand (in Cartesian coordinates $\left( x_1, x_2, \ldots, x_{2n} \right)$) factors as $\exp\left(-r^2 \right) = \exp\left(-x_1^2 \right) \cdots \exp\left(-x_{2n}^2 \right)$, each of which must be integrated from $-\infty$ to $+\infty$. Whence

$$S_{2n-1} \int_0^\infty {\exp \left(- r^2 \right) r^{2n-1} dr} = \left( \int_{- \infty}^ \infty {\exp \left( -x^2 \right) dx} \right) ^{2n}.$$

I will call the left hand integral $\tfrac{1}{2} \Gamma \left(n \right)$, because that is what it is (as a simple change of variables shows). In the same notation, $\Gamma \left(1/2 \right) = \int_{-\infty}^\infty {\exp\left(-x^2 \right) dx}$. Algebraic re-arrangement of the foregoing yields the volume of $D^{2n}$ as

$$|D^{2n} | = S_{2n - 1} \int_0^1 {r^{2n - 1} dr} = \frac{{S_{2n - 1} }} {{2n}} = \frac {\Gamma \left(1/2 \right)^{2n}} { n \Gamma \left(n \right) }.$$

That was the first method: the result is a familiar one, but has been left expressed in a way that better reveals its origins in polar and Cartesian integration.

The next way to compute the ball's volume is, I believe, new. It is inspired by Cavalieri’s Principle: the idea that you can shift slices of a solid around without changing the volume of the solid. The generalization is to move two-dimensional slices around and to change their shapes while you do so, but in a way that does not change their areas. It follows that the new solid has the same (hypervolume) as the original, although it might have a completely different shape.

We will compute the volume of a region $Q_n$ in $\mathbb{R}^{2n}$. It is conveniently described by identifying $\mathbb{R}^{2n}$ with $\mathbb{C}^{n}$, using coordinates $z_i = \left( x_{2i - 1}, x_{2i} \right)$, in terms of which

$$Q_n = \{ \mathbf{z} \in \mathbb{C}^n :0 \leq \left| {z_1 } \right| \leq \left| {z_2 } \right| \leq \cdots \leq \left| {z_n } \right| \leq 1 \}.$$

If these were real variables, we could make the volume-preserving transformation $w_1 = z_1, w_2 = z_2 – z_1, \ldots , w_i = z_i – z_{i-1}, \ldots, w_n = z_n – z_{n-1}$, with the sole restriction that the sum of the $w_i$ (all of which are nonnegative) not exceed 1. Because they are complex variables, though, we have to consider the area of an annulus bounded by $z_{i-1}$ and $z_i$: it is proportional to $z_i^2 – z_{i – 1}^2$. The circle of the same area has radius $w_i$ for which $w_i^2 = z_i^2 – z_{i – 1}^2$. Therefore, if we define new variables $w_i$ according to this formula, we obtain a new region- - one of substantially different shape- - having the same volume. This region is defined by $\left| {w_1 }^2 \right| + \cdots + \left| {w_n }^2 \right| \le 1$: that is, it’s our old friend $D^{2n}$. Therefore, the volume of $Q_n$ equals the volume of $D^{2n}$ .

Now for the punch line: $Q_n$ is a fundamental domain for the action of $S[n]$, the symmetric group, on the product of $n$ disks $T^{2n} = \left( D^2 \right) ^n$; $S[n]$ acts by permuting the Complex coordinates $z_1, \ldots, z_n$. The volume of $T^{2n}$ equals $|D^2|^n = \pi ^n$. Writing $|S[n]|$ for the number of permutations and equating our two completely different calculations of the volume of the $2n$ ball gives

$$\pi ^ n / |S[n]| = \frac {\Gamma \left(1/2 \right)^{2n}} { n \Gamma \left(n \right) },$$

whence

$$|S[n]| = \frac{{\pi ^n n\Gamma \left( n \right)}}{{\Gamma \left( {1/2} \right)^{2n} }}.$$

This simplifies: the volume formula for $n = 2$ must give the area of the unit circle, equal to $\pi$, whence $\Gamma \left( 1/2 \right)^2 = \pi$. Finally, then,

$$|S[n]| = n\Gamma \left( n \right).$$

I will finish by remarking that Liouville's method is a perfectly natural thing to encounter when working with the multivariate Normal distribution, so it's not really an isolated trick, but is rather a pretty basic result expressing a defining property of Normal (Gaussian) variates. There are, of course, many other ways to compute the volume of $D^{2n}$, but this one gives us the Gamma function directly.


Sorry to "revive," but here's something I noticed while writing 2012 Fall OMO Problem 25. IMO it gives a kind of neat perspective on the gamma function (as a particular case of a "continuous" generating function), so hopefully this is not too off-topic. It may be somewhat related to George Lowther's answer above, but I don't have the background to fully understand/appreciate his post. Also, there might be a bit of seemingly irrelevant setup here.

Anyways, first consider the following "discrete" problem:

Find the number of integer solutions to $a+b+c+d=18$ with $0\le a,b,c,d\le 6$.

This is fairly standard via PIE or generating functions: for a decent discussion, see this AoPS thread.

Now consider a close "continuous" variant:

Let $N$ be a positive integer and $M$ be a positive real number. Find the probability that $y_1,y_2,\ldots,y_{N-1}\le M$ for a random solution in nonnegative reals to $y_0+y_1+\cdots+y_N=1$.

The direct generalization would have $y_0,y_N\le M$ as well, but I'll keep it like this since the OMO problem uses this version (and both versions allow essentially the same solution).

It's easy to generalize the PIE solution to the discrete version, but here's what I got when I tried to generalize the generating functions solution (unfortunately it's not really rigorous, but I feel like it should all be correct).

To extend the discrete generating functions solution, suppose we work with formal power "integrals" indexed by nonnegative reals instead of formal power series (indexed by nonnegative integers). As a basic example, the integral $\int_0^\infty x^t\; dt$ has a $1$ coefficient for each $x^t$ term, and thus corresponds to $y_0,y_N$---the discrete analog would be something like $\sum_{t=0}^{\infty}x^t$. For $y_1,\ldots,y_{N_1}$, which are bounded above by $M$, we instead have $\int_0^M x^t\; dt$. By "convolution" the desired probability is then $$\frac{[x^1](\int_0^M x^t\; dt)^{N-1}(\int_0^\infty x^t\; dt)^2}{[x^1](\int_0^\infty x^t\; dt)^{N+1}}.$$ But $\int_0^M x^t\; dt = \frac{x^M - 1}{\ln{x}}$ and for positive integers $L$, $(-\ln{x})^{-L} = \frac{1}{(L-1)!}\int_0^\infty x^t t^{L-1}\; dt$. Note that this is essentially the gamma function when $x=e^{-1}$, and for $L=1$ we have $\int_0^\infty x^t\; dt = (-\ln{x})^{-1}$. This is not hard to prove by differentiating (w.r.t. $x$) under the integral sign, but for integer $L$ there's a simple combinatorial proof of $$\frac{1}{(L-1)!}\int_0^\infty x^t t^{L-1}\; dt = (\int_0^\infty x^t\; dt)^L$$ which may provide a bit of intuition for the gamma function. Indeed, suppose we choose $L-1$ points at random from the interval $[0,t]$; the resulting sequence is nondecreasing with probability $\frac{1}{(L-1)!}$. These $L-1$ "dividers" split the interval $[0,t]$ into $L$ nonnegative reals adding up to $t$, which corresponds to the convolution on the RHS. It's not hard to check that this yields a bijection, which explains the $[x^t]$ coefficient of the LHS ($\frac{t^{L-1}}{(L-1)!}$). (Compare with the classical "balls and urns" or "stars and bars" argument for the discrete analog. In fact, it shouldn't be hard to interpret the $[x^t]$ coefficient as a limiting case of the discrete version.)

Now back to the problem: by "convolution" again (but used in a substantive way this time) we find $$\begin{align*} [x^1]\left(\int_0^M x^t\; dt\right)^{N-1}\left(\int_0^\infty x^t\; dt\right)^2 &=[x^1](1-x^M)^{N-1}(-\ln{x})^{-(N-1)}(-\ln{x})^{-2} \\ &=[x^1](1-x^M)^{N-1}\frac{1}{N!}\int_0^\infty x^t t^N\; dt \\ &=\frac{1}{N!}\sum_{0\le k\le N-1,\frac{1}{M}}(-1)^k\binom{N-1}{k}(1-kM)^N \end{align*}$$ and $$[x^1](\int_0^\infty x^t\; dt)^{N+1} =[x^1](-\ln{x})^{-(N+1)} =[x^1]\frac{1}{N!}\int_0^\infty x^t t^N\; dt =\frac{1}{N!}.$$ The desired probability thus comes out to $\sum_{0\le k\le N-1,\frac{1}{M}}(-1)^k\binom{N-1}{k}(1-kM)^N$.

You can see how this explicitly applies to the OMO problem in my post here. Everything I wrote above is based off of the "Official 'Continuous' Generating Function Solution" there.