Constrained optimization using calculus of variations (entropy maximization)

We can think of this constrained problem as finding the "stationary points" of the objective function

$$ -\int_a^b \left[ p(t) \log p(t) + \lambda p(t) \right] \mathrm d t, $$

where $\lambda$ is a Lagrange multiplier for the constraint that $p$ must integrate to unity on $[a,b]$.

The Euler-Lagrange equation in this instance reads

$$ \log p + \lambda +1 = 0. $$

Notice that this implies that $p$ must be constant, and the only distribution with a constant density is the uniform distribution.

Observe also that one does not need to appeal to the Euler-Lagrange equation to do this; we can attempt to naively maximise our objective pointwise and discover that pointwise-maximisation works. Hence explaining it to an undergraduate necessitates only some facility in simple constrained optimisation instead of having to resort to the Calculus of Variations (the Lagrange multiplier method is standard material in any good undergraduate economics course).

The caveat regarding the Euler-Lagrange method is that it requires (unnecessary) regularity of $p$, in case you want to be rigourous. A more general method is to use Jensen's inequality, although you need to 'tweak' it a bit.

The function $g(t)=-t\log t$ for $t\geq 0$ (with $g(0)\equiv 0$) is strictly concave. Therefore, if $\mu$ is a probability measure on some space $(\Omega,{\cal B})$ and $p:\Omega\rightarrow {\Bbb R}_+$ is $\mu$-integrable with $m = \int p \; d\mu<+\infty$ then $g(t) \leq g(m) + g'(m)(t-m)$ for all $t\geq 0$. Composing with $p$ and integrating with respect to $\mu$ we get (this is in fact a proof of Jensen's inequality): $$ \int g(p(x)) \; d\mu(x) \leq g(m) + g'(m)\int (p(x)-m)\;d\mu(x) = g(m)$$ with equality iff $p(x)=m$ for $\mu$-a.e. point $x$.

In the present case, $d\mu(x) = \frac{1}{b-a} {\bf 1}_{[a,b]}(x)\; dx $ (normalized to become a probability measure), and since $m=\int p(x)\; d\mu(x) = \frac{1}{b-a}$ we conclude: $$ -\frac{1}{b-a} \int_a^b p(x)\log(p(x))dx \leq g(m) = -\frac{1}{b-a} \log \frac{1}{b-a}$$ or $H(p) \leq \log(b-a)$ with equality iff $p(x)=\frac{1}{b-a}$ for a.e. point in $[a,b]$

Constrained optimization using calculus of variations (entropy maximization)

Tags:

Machine Learning

Entropy

Probability

Calculus Of Variations

Related

Recent Posts