Why is the 2nd derivative written as $\frac{\mathrm d^2y}{\mathrm dx^2}$?

Somewhat mundanely,

$$ \frac{d}{dx}\left(\frac{d}{dx}(y)\right) = \frac{d}{dx}\left(\frac{dy}{dx}\right) = \frac{d\,dy}{dx\,dx} = \frac{d^2 y}{dx^2} $$


Purely symbolically, if we accept that $dy = f'(x)\,dx$, and treat $dx$ as a constant, then $$d^2y = d(dy) = d(f'(x)\,dx) = dx\,d(f'(x)) = dx\,f''(x)\,dx = f''(x)\,(dx)^2,$$ so dividing yields: $$\frac{d^2y}{(dx)^2} = \frac{d^2y}{dx^2} = f''(x).$$

As to where this notation actually comes from, though: My guess is that it comes from a time when mathematicians primarily thought of $dx$ and $dy$ as "infinitesimal quantities." There are ways of doing so rigorously (via non-standard analysis), and perhaps there is a way of making this notation rigorous that way.


However, we can still give rigorous meaning to these calculations without appealing to non-standard analysis by using the language of bilinear forms.

If $f$ is differentiable, we can define a map \begin{align*} df\colon \mathbb{R} & \to L(\mathbb{R}; \mathbb{R}) \\ df(x)(dx) & = f'(x)\,dx. \end{align*} Here, $L(\mathbb{R};\mathbb{R})$ denotes the set of linear maps from $\mathbb{R} \to \mathbb{R}$, and $dx$ is simply a real number. Going one step further, we can consider the map $$d^2f = d(df)\colon \mathbb{R} \to L(\mathbb{R};L(\mathbb{R};\mathbb{R})).$$ By identifying $L(\mathbb{R}; L(\mathbb{R}; \mathbb{R}))$ with the set of bilinear maps $B(\mathbb{R} \times \mathbb{R};\mathbb{R})$, we have the bilinear map $$d^2f(x)(dx^1, dx^2) = dx^1\, f''(x) \,dx^2$$ whose associated quadratic form is $$d^2f(x)(dx) = f''(x)\,(dx)^2.$$ It is now perfectly legal to divide on both sides by $(dx)^2$, obtaining $$\frac{d^2f}{dx^2} = f''(x).$$


The $d$ is meant to represent the "change in". And the Leibniz notation is meant to remind you that you are computing the ratio between the change in $y$ and the change in $x$.

When you take the second derivative, you are computing how the derivative is changing as $x$ changes; that is, you are trying to compute $$\frac{d(y')}{dx}.$$ Now, $y'$ is itself a rate of change: it is the rate at which $y$ changes. So the "numerator" of the differential notation is telling you that you are trying to consider the change in the change in $y$, not the change in $y^2$ (which is what "$dy^2$" would represent).

So you are trying to describe the change in "the-change-in-$y$", relative to how $x$ is changing. $x$ is only changing "once", so you should have a single $d$ in the "denominator" (remember, not really a denominator). So why $x^2$? Because you are trying to figure out the change of blah as $x$ changes, and blah is a rate of change as $x$ changes as well. So you are taking $x$ twice, but considering only one change. Hence, single $d$, but $x$ squared.