A differentiable approximation to the minimum function

A smooth approximation is $f(x) = -\frac{1}{\rho}\log \sum_i e^{-\rho x_i} $. The larger $\rho>0$, the closer the approximation is to the minimum.


If signs aren't a big deal, use the generalized mean formula

$$ \left(\frac{1}{n}\sum x_i^k\right)^{1/k} $$

for $k\to -\infty$.


I would use $$f(x) = -\frac{1}{\rho}\log \frac{1}{N} \sum_{i=1}^N e^{-\rho x_i},$$ which approaches $f(x) \rightarrow \min_i |x_i|$ as $\rho \rightarrow +\infty$.

There has been some debate about normalization, so let's compare these four: $$f_{A}(x) = -\frac{1}{\rho}\log \frac{1}{N}\sum_{i=1}^N e^{-\rho x_i} $$ $$f_{B}(x) = -\frac{1}{\rho}\log \sum_{i=1}^N e^{-\rho x_i} $$ $$f_C(x) = \left(\frac{1}{N}\sum x_i^{-\rho}\right)^{-1/\rho}$$ $$f_D(x) = \left(\sum x_i^{-\rho}\right)^{-1/\rho}$$

For $x$=[1,2,3,4,5] and $\rho=10$, $f_A=1.16$, $f_B=1.0$, $f_C=1.1745$, $f_D=1.0$.

For $x$=[1,2,3,4,5] and $\rho=100$, $f_A=1.016$, $f_B=1.0$, $f_C=1.0162$, $f_D=1.0$.

For $x$=[1,1,1,1,1] and $\rho=10$, $f_A=1$, $f_B=0.8391$, $f_C=1$, $f_D=0.8513$.

For $x$=[1,1,1,1,1] and $\rho=100$, $f_A=1$, $f_B=0.9839$, $f_C=1$, $f_D=0.9840$.

For $x$=[0,1,10,100,1000] and $\rho=10$, $f_A=0.1609$, $f_B=0$, $f_C=0$, $f_D=0$.

For $x$=[0,1,10,100,1000] and $\rho=100$, $f_A=0.0161$, $f_B=0$, $f_C=0$, $f_D=0$.

(For the last two cases, $f_C$ and $f_D$ can be evaluated by using $\log(0)=\infty$ and $\infty^{-1/\rho}=0$).

So as we would hope, when $\rho$ is large, all versions give reasonably good approximations to the minimum.

The question then is which is best when $\rho$ is not that big -- this is an important question since large $\rho$ can give practical difficulties with finite precision arithmetic. Comparing the four versions above, we can see that the mean versions (A and C) overestimate the minimum when $x$ has a lot of values above the minimum. Conversely, the sum versions (B and D) underestimate the minimum when $x$ has a lot of values that are all equal to the minimum. Which is better is ultimately a question of your application. But to me, the mean version gives an answer that makes much more sense. The approximate minimum should be similar to the minimum but the approximation should get pulled in the direction of all of the individual values in $x$. This is what happens in the mean version (A and C), which also keep the approximate minimum inside the range of the data, i.e. $$\min(x) \le f_A(x) , f_C(x) \le \max(x)$$ On the other hand, the sum versions (B and D) can give a minimum value that is actually lower than any value contained in the data (see the third and fourth examples above). In other words, it is possible that $$f_B(x), f_D(x) < \min(x),$$ which is a property I want to avoid in an approximate minimum since it makes it very hard to interpret that approximate-minimum value. So I find A,C to be more useful than B,D.

The last question is A vs C. Version A is fairly stable as $x_i \rightarrow 0$, but version C however runs out of precision. For example $x^{-100}$ overflows double precision below $x \approx 8.27e^{-4}$ and overflows single precision below $x \approx 0.412$.

Therefore, at least in my own uses, $f_A$ gives the most useful approximation and is the most numerically stable.