Fast sigmoid algorithm

People here are mostly concerned about how fast one function is relative to another and create micro benchmark to see whether f1(x) runs 0.0001 ms faster than f2(x). The big problem is that this is mostly irrelevant, because what matters is how fast your network learns with your activation function trying to minimize your cost function.

As of current theory, rectifier function and softplus enter image description here

compared to sigmoid function or similar activation functions, allow for faster and effective training of deep neural architectures on large and complex datasets.

So I suggest to throw away micro-optimization, and take a look at which function allows faster learning (also taking looking at various other cost function).


you don't have to use the actual, exact sigmoid function in a neural network algorithm but can replace it with an approximated version that has similar properties but is faster the compute.

For example, you can use the "fast sigmoid" function

f(x) = x / (1 + abs(x))

Using first terms of the series expansion for exp(x) won't help too much if the arguments to f(x) are not near zero, and you have the same problem with a series expansion of the sigmoid function if the arguments are "large".

An alternative is to use table lookup. That is, you precalculate the values of the sigmoid function for a given number of data points, and then do fast (linear) interpolation between them if you want.


It's best to measure on your hardware first. Just a quick benchmark script shows, that on my machine 1/(1+|x|) is the fastest, and tanh(x) is the close second. Error function erf is pretty fast too.

% gcc -Wall -O2 -lm -o sigmoid-bench{,.c} -std=c99 && ./sigmoid-bench
atan(pi*x/2)*2/pi   24.1 ns
atan(x)             23.0 ns
1/(1+exp(-x))       20.4 ns
1/sqrt(1+x^2)       13.4 ns
erf(sqrt(pi)*x/2)    6.7 ns
tanh(x)              5.5 ns
x/(1+|x|)            5.5 ns

I expect that the results may vary depending on architecture and the compiler used, but erf(x) (since C99), tanh(x) and x/(1.0+fabs(x)) are likely to be the fast performers.