When is the derivative of an inverse function equal to the reciprocal of the derivative?

Assume $g(f(x))=x$. Then $$g'(f(x))f'(x)=1$$ and then $$g'(f(x))=\frac1{f'(x)}$$

Note that we need also that $f'(x)\neq 0$. All the conditions (the injectivity and the differentability of $f$ and that $f'$ does not vanish) must meet in a neighbourhood of the point where you are differentiating, that is, this works locally.

See the inverse function theorem.


The answers so far are arguably incorrect; they merely give sufficient but not necessary conditions, and one of them even states that their conditions are necessary. We do not need differentiability in some (open) neighbourhood of the point, even for the conventional (very restrictive) definition of derivative. Furthermore, if we work with a natural generalized definition of derivative, we do not even need one-to-one correspondence between the values of $x$ and values of $y$ near the point, for the derivative to exist there. I shall first state and prove the general fact, and then give examples that refute the necessity of these two conditions. $ \def\less{\smallsetminus} \def\rr{\mathbb{R}} \def\lfrac#1#2{{\large\frac{#1}{#2}}} $

Theorem

If $\lfrac{dy}{dx}$ exists and is not zero, then $\lfrac{dx}{dy}$ exists and is the reciprocal.

This holds in any framework where $\lfrac{dy}{dx}$ is the limit of $\lfrac{Δy}{Δx}$ as $Δt \to 0$ (undefined if the limit is undefined), where $x,y$ are variables that vary continuously with respect to some parameter $t$ (which could be $x$ itself). Here "$Δx$" denotes change in $x$ from a given point, and so "$Δt \to 0$" essentially captures the limiting behaviour as $t$ approaches (but does not reach) a certain value. This captures not only the usual situations such as derivatives of functions, but also allows simple yet rigorous implicit differentiation even for constraints that are not locally bijective.

(See below for notes justifying this framework.)

Proof

Take any variables $x,y$ varying with parameter $t$.

Take any point where $\lfrac{dy}{dx} \in \rr \less \{0\}$.

As $Δt \to 0$:

$\lfrac{Δy}{Δx} \approx \lfrac{dy}{dx} \ne 0$.

  Thus $\lfrac{Δy}{Δx} \ne 0$ and hence $Δy \ne 0$ (eventually).

  Thus $\lfrac{Δx}{Δy} = (\lfrac{Δy}{Δx})^{-1} \approx (\lfrac{dy}{dx})^{-1}$.

Therefore $\lfrac{dx}{dy} = (\lfrac{dy}{dx})^{-1}$.

Example 1

enter image description here

Consider $f : \rr \to \rr$ such that $f(0) = 0$ and $f(x) = \lfrac{2}{\lfrac1x+2(1-(\frac1x\%2))}$ for every $x \in \rr \less \{0\}$, where "$x\%y$" is defined to mean "$x-\lfloor \lfrac{x}{y} \rfloor y$".

Then $f$ is a bijection from $\rr$ to $\rr$ and has gradient $2$ at $0$ but is clearly not differentiable at any open interval around $0$. Since $\lfrac{d(f(x))}{dx} = 2$, satisfying the condition I stated, $f^{-1}$ has gradient $\lfrac12$ at $0$.

Note that $f'(0)$ and ${f^{-1}}'(0)$ both exist even under the conventional definition of derivative, because $f$ is bijective, and $y=f(x)$ is squeezed between $y=\frac2{1/x+2}$ and $y=\frac2{1/x-2}$, which are tangent at the origin. So this provides a counter-example to the claim that we need differentiability in some open neighbourhood.

Example 2

enter image description here

Let $t$ be a real parameter and $x,y$ be variables varying with $t$ such that $(x,y) = (0,0)$ if $t = 0$ and $(x,y) = (t+2t^3\cos(\lfrac1{t^2}),t+2t^3\sin(\lfrac1{t^2}))$ if $t \ne 0$.

Then $\lfrac{dy}{dx} = \lfrac{dx}{dy} = 1$ when $t = 0$ despite the curve having no local bijection between the values of $x$ and the values of $y$ in any open ball around the origin!

Notice that the conventional framework for real analysis cannot even state this fact that the curve has gradient $1$ at the origin! This is one kind of situation where the framework I am using is superior; another kind involves path integrals.

Notes

This framework is self-consistent and more general than the conventional one in 'elementary calculus' where you can only write "$\lfrac{dy}{dx}$" when $y$ is a function of $x$. If you think about it a little, you would realize that "function of $x$" is nonsense in the logical sense. In any standard foundational system, no object $y$ can be both a function and a real number. So it is utterly meaningless to say "$y$ is a function of $x$". Yet people write things like "$y = f(x)$ where $f$ is a function from $\rr$ to $\rr$". This technically is equally nonsensical, because either $x$ is previously defined and so $y$ is just a single real number, or $x$ is treated as a parameter so $y$ is actually an expression in the language of the foundational system. Only in the latter case does it make sense to ask for the derivative of $y$ with respect to $x$, which is also an expression otherwise it is senseless. If you are actually rigorous, you would find that many texts use ambiguous or inconsistent notation for this very reason.

However, the framework I used above is rigorous yet logically consistent. Specifically, when we say that a set of variables vary with a parameter $t$, it should be interpreted as that each variable is a function over the range of $t$, and every expression involving the variables denotes a function by interpreting "$t$" to be its input parameter and all operations to be pointwise. For example if we say that $x,y$ vary with $t \in \rr$, we should interpret $x,y$ as functions on $\rr$ and interpret expressions like "$xy+t$" to be the pointwise sum of $x,y$ plus the input, namely $( \rr\ t \mapsto x(t)y(t)+t )$. Similarly we should interpret "$Δx$" to denote "$( \rr\ t \mapsto x(t+Δt)-x(t) )$", where "$Δt$" is interpreted as a free parameter with exactly the same function as "$h$" in "$\lim_{h \to 0} \lfrac{x(t+h)-x(t)}{h}$". Finally, we permit the evaluation of the variables at a given point, so for example we might say "when $x = 0$, ..." which should be interpreted as "for every $t$ such that $x(t) = 0$, ...".

Also, we must make a distinction between "$→$" and "$≈$". "$x → c$" means "$x$ eventually stays close but not equal to $c$", while "$x ≈ y$" means "$x$ eventually stays close to $y$ (possibly equal)". You could express these via the typical ε-δ definition of limits, but it is easier to view them topologically; "$x ≈ y$ as $Δt → 0$" would mean "given any ball $B$ around $0$, $(x-y)(t+Δt)$ lies in $B$ for every $Δt$ in some sufficiently small punctured ball around $0$". (An alternative view that is equivalent under a weak choice principle is via sequential continuity; "$x ≈ y$ as $Δt → 0$" would mean "for every sequence $Δt$ that is eventually nonzero but converges to zero, the sequence $(x-y)(t+Δt)$ converges to zero".)

Now it is easy to check that my above definition of "$\lfrac{dy}{dx}$" is absolutely rigorous and not only matches the intuitive notion of gradient far better but also is far more general. In fact, as I've shown above, it is easier to translate intuitive arguments for properties of gradients into this framework. For example, the above proof is a direct translation of the symmetry of ratios.

Finally, this framework is built upon and hence completely compatible with standard real analysis, using no unnecessary set-theoretic axioms, unlike non-standard analysis. It also extends naturally to asymptotic notation.


From the definition of derivative at a point you can see all the requirements. Let $f$ injective in $[a,b]$ and $f^{-1}$ it inverse, then

$$[f^{-1}]'(c)=\lim_{y\to c}\frac{f^{-1}(y)-f^{-1}(c)}{y-c}$$

Now, because $f^{-1}$ is bijective then $f^{-1}(y)=f^{-1}(f(x))=x$ for some $x\in[a,b]$. In the same way $c=f(x_0)$ for some $x_0\in[a,b]$. Then

$$[f^{-1}]'(f(x_0))=\lim_{f(x)\to f(x_0)}\frac{x-x_0}{f(x)-f(x_0)}$$

Now: if $f^{-1}$ is continuous at $c$, then $$f(x)\to f(x_0)\implies x\to x_0$$ Hence

$$[f^{-1}]'(f(x_0))=\lim_{f(x)\to f(x_0)}\frac{x-x_0}{f(x)-f(x_0)}=\lim_{x\to x_0}\frac{x-x_0}{f(x)-f(x_0)}=\frac1{f'(x_0)}$$

If $f'(x_0)$ exists and is different from zero then the above is well-defined.