Gaussian distribution, maximum entropy and the heat equation

Both the Gaussian maximum entropy distribution and the Gaussian solution of the diffusion equation (heat equation) follow from the central limit theorem, that the limiting distribution of the sum of i.i.d. random variables with given average and variance is a Gaussian. The connection between the central limit theorem and the diffusion equation, which describes a random walk with zero mean and unit variance, is obvious; the connection between the central limit theorem and the maximum entropy distribution is less obvious. For pointers to the literature on the latter connection, see these MO question and answers.


Besides the central limit theorem, there is the connection between diffusion and Wasserstein distance $W_2(p,q)$ (the minimum integral of squared distance from $x$ to $T(x)$ when $T$ maps $p$ to $q$) : for that metric the heat equation is the gradient flow of $\Phi(p)=-\int p\log p$.

http://www-dimat.unipv.it/savare/Ravello2010/ravelloC.pdf


Here is yet another late answer, but I hope it is relevant. Let me make first make clear that I use the mathematical "minus entropy" convention (as is common in my field, which is optimal transport). In other words, for me the entropy of a probability distribution $u$ is $$ \mathcal H(u)=+\int_{\mathbb R}u \log u. $$ A beautiful result due to R. Jordan, D. Kinderlehrer, and F. Otto tells us that one can interpret the heat equation $\partial_t u=\Delta u$ as the gradient-flow of the entropy, $$ \partial_t u=-\operatorname{grad}_{W_2} \mathcal H(u) $$ with respect to the quadratic wasserstein metric $W_2$ over $\mathbb R$ (or $\mathbb R^d$, for that matters). Regardless of the interpretation of this Wasserstein gradient (which would deserve a lot more than this simple post, see e.g. C. Villani's books), this tells us that the entropy (or, again, minus the entropy depending on one's conveniton) decreases (increases) as fast as possible and tends to be minimized (maximized) along the time-evolution.

Now, from the purely PDE perspective it is easy to check that the evolution preserves the mass and the average, and that the variance grows linearly in time, which means here that $$ \int u(t,x)dx=1, \qquad \int xu(t,x) dx=m, \qquad \int x^2 u(t,x)=t $$ since we started from $u(0,.)=\delta_m(.)$. (Simply differentiate w.r.t time under the integral signs, use the PDE to substitute $\Delta u$ for $\partial_t u$, and finally integrate by parts.)

Owing to the very gradient flow structure (entropy tends to diminish as fast as possible) it becomes then very likely (and true indeed!) that the solution $u(t,.)$ of the heat flow at time $t$ is minimizing the entropy among all probability distributions satisfying the constraints, i-e average $m$ and variance $t$. OF course that's not a proof, but still a damn good hint I believe.