The second differential versus the differential of a differential form

I will answer both questions. As @md2perpe said in the other answer, one use of the second-order differential is in quadratic approximation. More generally, all of the higher-order differentials together make up a Taylor series, which (for analytic functions, at least locally) is not just an approximation but exact. Yet another use of differentials is to take care of the Chain Rule when performing a change of variables, although you have to be careful here, because the second differential that you have doesn't do that (because it's missing a term). So that's two purposes. A third purpose, in theory, is local optimization, although I'm not sure how much it helps in practice.

Before I can explain these applications, I'll need the total second differential of $ y $ when $ y = f ( x ) $. The first differential is $ \mathrm d y = f ' ( x ) \, \mathrm d x $, which depends on both $ x $ and $ \mathrm d x $, so its differential has two terms, which we can find with the help of the Product Rule: $$ \eqalign { \mathrm d ^ 2 y = \mathrm d ( \mathrm d y ) & = \mathrm d \big ( f ' ( x ) \, \mathrm d x \big ) = \mathrm d \big ( f ' ( x ) \big ) \, \mathrm d x + f ' ( x ) \, \mathrm d ( \mathrm d x ) \\ & = \big ( f ' ' ( x ) \, \mathrm d x \big ) \, \mathrm d x + f ' ( x ) \, \mathrm d ^ 2 x = f ' ' ( x ) \, \mathrm d x ^ 2 + f ' ( x ) \, \mathrm d ^ 2 x \text . } $$ This is the correct rule if you want to do change of variables by substitution; that is, if $ x = g ( t ) $, so that $ y = ( f \circ g ) ( t ) $, then using $ \mathrm d x = g ' ( t ) \, \mathrm d t $ and $ \mathrm d ^ 2 x = g ' ' ( t ) \, \mathrm d t ^ 2 + g ' ( t ) \, \mathrm d ^ 2 t $, we get $$ \mathrm d y = f ' ( x ) \, \mathrm d x = f ' \big ( g ( t ) \big ) \big ( g ' ( t ) \, \mathrm d t \big ) = f ' \big ( g ( t ) \big ) g ' ( t ) \, \mathrm d t \text , $$ so $ ( f \circ g) ' ( t ) = f ' \big ( g ( t ) \big ) g ' ( t ) $, which you know is correct; and also $$ \eqalign { \mathrm d ^ 2 y & = f ' ' ( x ) \, \mathrm d x ^ 2 + f ' ( x ) \, \mathrm d ^ 2 x = f ' ' \big ( g ( t ) \big ) \big ( g ' ( t ) \, \mathrm d t \big ) ^ 2 + f ' \big ( g ( t ) \big ) \big ( g ' ' ( t ) \, \mathrm d t ^ 2 + g ' ( t ) \, \mathrm d ^ 2 t \big ) \\ & = \Big ( f ' ' \big ( g ( t ) \big ) g ' ( t ) ^ 2 + f ' \big ( g ( t ) \big ) g ' ' ( t ) \Big ) \, \mathrm d t ^ 2 + f ' \big ( g ( t ) \big ) g ' ( t ) \, \mathrm d ^ 2 t \text , } $$ so $ ( f \circ g ) ' ' ( t ) = f ' ' \big ( g ( t ) \big ) g ' ( t ) ^ 2 + f ' \big ( g ( t ) \big ) g ' ' ( t ) $, which is less famous but also correct. This is probably the most common application.

Now if you want to, you can partially evaluate the second differential $ \mathrm d ^ 2 y $ when $ \mathrm d ^ 2 x = 0 $, getting a partial second differential showing only the dependance on $ x $ and not on $ \mathrm d x $: $$ ( \partial ^ 2 y ) _ { \mathrm d x } = \mathrm d ^ 2 y \rvert _ { \mathrm d ^ 2 x = 0 } = f ' ' ( x ) \, \mathrm d x ^ 2 \text . $$ Then if you divide by $ \mathrm d x $, you could call that the partial derivative of $ \mathrm d y $ with respect to $ x $; but since $ \mathrm d y $ is itself a differential, we usually divide by $ \mathrm d x $ again to get the second derivative of $ y $ with respect to $ x $. Since $ y $ depends only on $ x $, this is really a total second derivative, which is why people usually write it as $ \mathrm d ^ 2 y / \mathrm d x ^ 2 $, even though it's not literally the quotient of $ \mathrm d ^ 2 y $ and $ \mathrm d x ^ 2 $, in contrast to the first derivative. (You could fairly write it as $ \partial ^ 2 y / \partial x ^ 2 $, or even $ ( \partial ^ 2 y / \partial x ^ 2 ) _ { \mathrm d x } $ to indicate what is held fixed, but this is unlikely to catch on; or if you want to be both pedantic and understood, you can still write $ ( \mathrm d / \mathrm d x ) ^ 2 y $.) Partial though it is, this second differential does have its uses, as in the quadratic approximation to $ f $ at $ a $: $$ Q ( x ) = f ( a ) + f ' ( a ) ( x - a ) + \frac 1 2 f ' ' ( a ) ( x - a ) ^ 2 = \Big ( y + \mathrm d y + \frac 1 2 \mathrm d ^ 2 y \Big ) \Big \rvert _ { x = a , \, \mathrm d x = x - a , \, \mathrm d ^ 2 x = 0 } \text . $$ More generally, we have the Taylor series of $ f $ at $ a $: $$ T ( x ) = \sum _ { n = 0 } ^ \infty \frac 1 { n ! } f ^ { ( n ) } ( a ) ( x - a ) ^ n = \sum _ { n = 0 } ^ \infty \frac 1 { n ! } \mathrm d ^ n y \Big \rvert _ { x = a , \, \mathrm d x = x - a , \, \mathrm d ^ n x = 0 \, \text {for} \, n \geq 2 } \text . $$ And if $ f $ is analytic at $ a $, then $ T ( x ) $ converges to $ f ( x ) $, at least on some nieghbourhood of $ a $.

Another potential application is local optimization. Normally we say that $ f $ has a local minimum at $ a $ only if $ f ' ( a ) = 0 $ and $ f ' ' ( a ) \geq 0 $ (assuming that $ f $ is twice-differentiable at $ a $ and defined on a neighbourhood of $ a $) and that $ f $ has a maximum at $ a $ if $ f ' ( a ) = 0 $ and $ f ' ' ( a ) > 0 $. In higher dimensions, the parts about $ f ' ' $ are generalized to saying that the Hessian matrix is positive (semi)-definite, but the parts about $ f ' $ are also still there (referring to the gradient vector). But you can combine each of these into a single statement: $ y $ has a minimum at $ x = a $ only if $ \mathrm d ^ 2 y \rvert _ { x = a } \geq 0 $ for all nonzero values (hence all values) of $ \mathrm d x $ and $ \mathrm d ^ 2 x $ (a kind of positive semidefiniteness), and $ y $ has a minimum at $ x = a $ if $ \mathrm d ^ 2 y \rvert _ { x = a } > 0 $ for all nonzero values of $ \mathrm d x $ and $ \mathrm d ^ 2 x $ (a kind of positive definiteness). This works unchanged in higher dimensions (replacing $ x $ with a point in $ \mathbb R ^ n $) as long as the function is twice-differentiable, and it can even handle points on the boundary of the domain if you're careful. I'm not sure how useful this is, because you still have to pull out the gradient vector and the Hessian matrix to analyse it with the tools of linear algebra, but it's a nice way to think about it.


Now to show the connection to differential forms, I want to say something about what $ \mathrm d ^ 2 x $, $ \mathrm d x ^ 2 $, and so forth really mean. As you probably know, one way to think of an exterior differential form is as a multilinear antisymmetric operation on tangent vectors. These expressions are not exterior differential forms, but we can still think of them as generalized differential forms, giving operations on tangent vectors that are not necessarily multilinear or antisymmetric.

So if you're working in $ \mathbb R ^ 2 $ (where a tangent vector at a given point is essentially just another point in $ \mathbb R ^ 2 $), with $ x $ and $ y $ as the standard coordinate functions, then the differential form $ 2 x \, \mathrm d x + 3 y ^ 2 \, \mathrm d y $ at a point $ ( x _ 0 , y _ 0 ) $ takes a vector $ ( v _ x , v _ y ) $ and returns $ 2 v _ x + 3 y _ 0 ^ 2 v _ y $. And the differential form $ x ^ 2 \mathrm d x \wedge \mathrm d y $ at a point $ ( x _ 0 , y _ 0 ) $ takes two vectors, $ ( v _ x , v _ y ) $ and $ ( w _ x , w _ y ) $, and returns $ x _ 0 ^ 2 ( v _ x w _ y - v _ y w _ x ) $ (or half that, depending on your convention). Similarly, the generalized differential form $ \sqrt { d x ^ 2 + d y ^ 2 } $ at a point $ ( x _ 0 , y _ 0 ) $ takes a vector $ ( v _ x , v _ y ) $ and returns $ \sqrt { v _ x ^ 2 + v _ y ^ 2 } $. This is not linear, but it still makes sense. And you can even define what it means to integrate this form along a curve and prove that the value of the integral is the arclength of the curve. So there is no reason that you cannot perform arbitrary operations on differentials.

As for $ \mathrm d ^ 2 x $ and $ \mathrm d ^ 2 y $, these also simply return the $ x $- or $ y $-component of a vector, only the interpretation of this vector is different. That is, while $ x $ and $ y $ return the $ x $- and $ y $-coordinates of a point thought of as representing position, and $ \mathrm d x $ and $ \mathrm d y $ return the $ x $- and $ y $-components of a vector thought of as representing velocity, so $ \mathrm d ^ 2 x $ and $ \mathrm d ^ 2 y $ return the $ x $- and $ y $-components of a vector thought of as representing acceleration, and so on. This is a little more subtle on a more general manifold, but if you work in local coordinates, then you don't really have to pay attention to the subtleties as long as your higher differentials respect the Chain Rule. So if $ y = f ( x ) $, then the second differential $ \mathrm d ^ 2 y = f ' ' ( x ) \, \mathrm d x ^ 2 + f ' ( x ) \, \mathrm d ^ 2 x $ at a point $ x = x _ 0 $ takes a velocity $ v $ and an acceleration $ a $ and returns $ f ' ' ( x _ 0 ) v ^ 2 + f ' ( x _ 0 ) a $, and similarly in more dimensions.

Now, there is another possible version of the second differential, which to avoid ambiguity I will write as $ \mathrm d \otimes \mathrm d x $ or $ \mathrm d ^ { \otimes 2 } x $ for short. But first I should say what $ \mathrm d x \otimes \mathrm d x $ or $ \mathrm d x \otimes \mathrm d y $ means. This is, like the exterior form $ \mathrm d x \wedge \mathrm d y $, an operation that acts on two tangent vectors (at a given point); $ \mathrm d x \otimes \mathrm d x $ multiples their $ x $-components together, and $ \mathrm d x \otimes \mathrm d y $ multiplies the $ x $-component of the first vector by the $ y $-component of the second vector. (Then $ \mathrm d x \wedge \mathrm d y $ itself is $ \mathrm d x \otimes \mathrm d y - \mathrm d y \otimes \mathrm d x $, or half that, depending on your convention.) This is multilinear, but it's not antisymmetric, so it's not an exterior differential form, but it's still a generalized differential form. Note that now both vectors represent a velocity, but they represent velocities along two different curves, or along two edges of a parallelogram (or triangle). Then $ \mathrm d ^ { \otimes 2 } x $ is another vector, still a kind of acceleration, but it indicates how the first velocity vector changes when moving in the direction of the second velocity vector (or how the second changes when moving in the direction of the first, which on an infinitesimal level is the same, essentially because of Schwarz's Theorem). Now if $ y = f ( x ) $, we have $$ \mathrm d \otimes \mathrm d y = f ' ' ( x ) \, \mathrm d x \otimes \mathrm d x + f ' ( x ) \, \mathrm d \otimes \mathrm d x \text , $$ which at a point $ x = x _ 0 $ takes two velocities $ v _ 1 $ and $ v _ 2 $ and an acceleration $ a $ and returns $ f ' ' ( x _ 0 ) v _ 1 v _ 2 + f ' ( x _ 0 ) a $.

Now antisymmetrize this, and we can consider $ \mathrm d \wedge \mathrm d x $. Then if $ y = f ( x ) $, $ \mathrm d \wedge \mathrm d y = f ' ' ( x ) \, \mathrm d x \wedge \mathrm d x + f ' ( x ) \, \mathrm d \wedge \mathrm d x $. But this is the exterior product (aka wedge product) and exterior differential (aka exterior derivative) that you know from exterior differential forms, and so this all comes to zero. Of course, in more dimensions, there are more interesting exterior forms, but $ \mathrm d \wedge \mathrm d $ will still be zero. When working exclusively with exterior forms, one may leave out all of the wedges; this is often done with the exterior product and essentially always done with the exterior differential. But I have included all of the wedges here to contrast with the kind of multiplication and differentiation that appears in the second differential.


The ordinary "first order" differential can be seen as a linearization. The second order differential can then be seen as a "quadricization". This might best be seen in multidimensional analysis.

First order differential: $$df(v) = \sum_i \frac{\partial f}{\partial x^i} v^i,$$ where $v^i$ is the component of the vector $v$ in the $x^i$ direction.

Second order differential: $$d^2f(v_1, v_2) = \sum_{i,j} \frac{\partial^2 f}{\partial x^i \, \partial x^j} v_1^i v_2^j$$