Looking for a function that approximates a parabola

I would suggest just approximating the $\max(0,\cdot)$ function, and then using that to implement $\max(0,1-x^2)$. This is a very well-studied problem, since $\max(0,\cdot)$ is the relu function which is currently ubiquitous in machine learning applications. One possibility is $$ \max(0,y) \approx \mu(w)(y) = \frac{y}2 + \sqrt{\frac{y^2}4 + w} $$ soft clip

One way of deriving this formula: it's the positive-range inverse of $x\mapsto x-\tfrac{w}x$.

Then, composed with the quadratic, this looks thus:

softclipped parabola

Notice how unlike Calvin Khor's suggestions, this avoids ever going negative, and is easier to adopt to other parabolas.


Nathaniel remarks that this approach does not preserve the height of the peak. I don't know if that matters at all – but if it does, the simplest fix is to just rescale the function with a constant factor. That requires however knowing the actual maximum of the parabola itself (in my case, 1).

Rescaled to preserve peak height

To get an even better match to the original peak, you can define a version of $\mu$ whose first two Taylor coefficients around 1 are both 1 (i.e. like the identity), by rescaling both the result and the input (this exploits the chain rule): $$\begin{align} \mu_{1(1)}(w,y) :=& \frac{\mu(w,y)}{\mu(1)} \\ \mu_{2(1)}(w,y) :=& \mu_{1(1)}\left(w, 1 + \frac{y - 1}{\mu'_{1(1)}(w,1)}\right) \end{align}$$

The Taylor-correcting relu approximation

And with that, this is your result:

Taylor-corrected relued-parabola approx

The nice thing about this is that Taylor expansion around 0 will give you back the exact original (un-restricted) parabola.


Source code (Haskell with dynamic-plot):

import Graphics.Dynamic.Plot.R2
import Text.Printf

μ₁₁, μ₁₁', μ₂₁ :: Double -> Double -> Double
μ₁₁ w y = (y + sqrt (y^2 + 4*w))
             / (1 + sqrt(1 + 4*w))
μ₁₁' w y = (1 + y / sqrt (y^2 + 4*w))
             / (1 + sqrt(1 + 4*w))
μ₂₁ w y = μ₁₁ w $ 1 + (y-1) / μ₁₁' w 1

q :: Double -> Double
q x = 1 - x^2

main :: IO ()
main = do
 plotWindow
  [ plotLatest [ plot [ legendName (printf "μ₂₍₁₎(w,q(x))")
                         . continFnPlot $ μ₂₁ w . q
                      , legendName (printf "w = %.2g" w) mempty
                      ]
               | w <- (^1500).recip<$>[1,1+3e-5..]]
  , legendName "max(0,q x)" . continFnPlot
       $ max 0 . q, xAxisLabel "x"
  , yInterval (0,1.5)
  , xInterval (-1.3,1.3) ]
 return ()

Here is a family of real analytic functions that does the job; making the parameter $N=1,2,3,4,\dots$ larger makes the approximation better. Lets just say the quadratic is $q(x) = 1-x^2$, so the function is $f(x)=\max(0,q(x))$. Its easy to do a more general case. Then set $$f_N(x) = \frac1{1+x^{2N}}q(x).$$ This works because for $N$ very big, when $|x|<1$, $x^{2N}\approx 0$, so $\frac1{1+x^{2N}}\approx 1$. But the moment $|x|>1$, $x^{2N}$ becomes huge, so then $\frac1{1+x^{2N}} \approx 0$. It's not exactly 0 outside $|x|<1$, but instead decays like $O(|x|^{-2N+2})$ (again, crank up $N$ to get a better approximation)

This is what it looks like for $N=10$: enter image description here

The general case: the quadratic with height $h$ above $y=c$, base $d$ centered around $x_0$ is $$ q_{h,c,d,x_0}(x)=h q\left(\frac{x-x_0}{d/2}\right)+c.$$ The quadratic truncated at $y=c$ is then $$ f_{h,c,d,x_0}(x) = h f\left(\frac{x-x_0}{d/2}\right)+c,$$ and the corresponding approximation I am proposing is $$ f_{N,h,c,d,x_0}(x) = h f_N\left(\frac{x-x_0}{d/2}\right)+c.$$

You can adjust these parameters in this interactive Desmos graph.

A further variation is using $1/(1-x^{2N})$ instead to define $f_N$; this makes $f_N$ everywhere positive (because it changes sign exactly where $q$ does, and $f_N$ is differentiable at the truncation points by l'Hopital), which could be useful. But the approximation is worse near the boundary.

Here's a graph of this last variant-enter image description here

The first variant is exactly 1 at $x=0$, and exactly 0 at $x=\pm 1$. It has the same sign as the original polynomial $1-x^2$ everywhere.

The second is exact only at $x=0$, (but even if $N$ is small), and is everywhere positive. (In particular, its a slight mischaracterisation to say that "unlike Calvin Khor's suggestions, this avoids ever going negative" :) but I didn't include a graph of this variant earlier, and their solution is great too!)


As an expansion of Calvin Khor's answer, you need to generate a suitable approximation to the square pulse function, which can be done in a number of different ways.

Consider that the error function goes from -1 to 1, with a relatively rapid transition from one side to the other. Using this function, we can write $$ f_a(x)=\frac{\text{erf}(ax)+1}2-\frac{\text{erf}(a(x-1))+1}2=\frac{\text{erf}(ax)-\text{erf}(a(x-1))}2 $$ Here is what that function looks like for $a=5$ (red), $a=50$ (blue), and $a=500$ green):

Approximating the Square Pulse

We can translate this as needed, then multiply by our required polynomial, in order to get our fit. Fitting $(1-x^2)$ leads us to $$ f(x)\approx (1-x^2)f_a(2x+1) $$ which, for the same values of $a$, gives

Approximating our function

Any sequence that converges towards the square pulse will work. Instead of $\text{erf}$, you could use $\tan^{-1}$ (with appropriate rescaling), for example, or $\tanh$.