Why log-of-sum-of-exponentials $f(x)=\log\left(\sum_{i=1}^n e^ {x_i}\right)$ is a convex function for $x \in\mathbb R^n$

Proof:

Let $u_i=e^ {x_i} ,v_i=e^ {y_i}$. So $f(\theta x+(1-\theta)y)=log(\sum_ {i=1}^n e^{\theta x_i + (1-\theta)y_i})=log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)})$

From Hölder's inequality:

$$\sum_ {i=1}^n x_iy_i \le (\sum_ {i=1}^n|x_i|^p)^{\frac{1}{p}} \cdot (\sum_ {i=1}^n|x_i|^q)^{\frac{1}{q}}$$ where $1/p+1/q=1$.

Applying this inequality to $f(\theta x+(1-\theta)y)$: $$log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)}) \le log[(\sum_ {i=1}^n u_i^ {\theta \cdot \frac{1}{\theta}})^ \theta \cdot (\sum_ {i=1}^n v_i^ {1-\theta \cdot \frac{1}{1-\theta}})^ {1-\theta}]$$ Right formula can be reduced to:

$$\theta log\sum_ {i=1}^n u_i+(1-\theta)log\sum_ {i=1}^n v_i$$

Here I regard $\theta$ as $\frac{1}{p}$ and $1-\theta$ as $\frac{1}{q}$.

So I achieve that $f(\theta x+(1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$.


It is enough to show that $$\frac{1}{2} \log (\sum \exp x_i) + \frac{1}{2}\log (\sum \exp y_i)\ge \log (\sum \exp\frac{x_i+y_i}{2})$$ or, with the substitution $\exp\frac{x_i}{2} = a_i$, $\exp\frac{y_i}{2} = b_i$ $$(\sum a_i^2)^{\frac{1}{2}}(\sum b_i^2)^{\frac{1}{2}}\ge \sum a_i b_i$$


Another way to prove the convexity of this function is to use the Jensen's Inequality which states that a function is convex if and only if

$$f(tX+(1-t)Y) \le t f(X) + (1-t)f(Y)$$

Now let $X$ be represented by the vector $({X_1, X_2, X_3,... X_n})$,

and let $Y$ be represented by the vector $({Y_1, Y_2, Y_3,... Y_n})$

Let $t = \dfrac{1}{2}$

$$f(tX+(1-t)Y) = \log\left(\sum_{i=1}^{n} e^{\frac{X_i+Y_i}{2}}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\right)+ \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{Y_i}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\sum_{i = 1}^{n} e^{Y_i}\right)$$

RHS contains more cross product terms than the LHS thus making it larger than LHS and hence the function is convex.