If d/dx is an operator, on what does it operate?

(From the post on my blog:)

To my way of thinking, this is a serious question, and I am not really satisfied by the other answers and comments, which seem to answer a different question than the one that I find interesting here.

The problem is this. We want to regard $\frac{d}{dx}$ as an operator in the abstract senses mentioned by several of the other comments and answers. In the most elementary situation, it operates on a functions of a single real variable, returning another such function, the derivative. And the same for $\frac{d}{dt}$.

The problem is that, described this way, the operators $\frac{d}{dx}$ and $\frac{d}{dt}$ seem to be the same operator, namely, the operator that takes a function to its derivative, but nevertheless we cannot seem freely to substitute these symbols for one another in formal expressions. For example, if an instructor were to write $\frac{d}{dt}x^3=3x^2$, a student might object, "don't you mean $\frac{d}{dx}$?" and the instructor would likely reply, "Oh, yes, excuse me, I meant $\frac{d}{dx}x^3=3x^2$. The other expression would have a different meaning."

But if they are the same operator, why don't the two expressions have the same meaning? Why can't we freely substitute different names for this operator and get the same result? What is going on with the logic of reference here?

The situation is that the operator $\frac{d}{dx}$ seems to make sense only when applied to functions whose independent variable is described by the symbol "x". But this collides with the idea that what the function is at bottom has nothing to do with the way we represent it, with the particular symbols that we might use to express which function is meant. That is, the function is the abstract object (whether interpreted in set theory or category theory or whatever foundational theory), and is not connected in any intimate way with the symbol "$x$". Surely the functions $x\mapsto x^3$ and $t\mapsto t^3$, with the same domain and codomain, are simply different ways of describing exactly the same function. So why can't we seem to substitute them for one another in the formal expressions?

The answer is that the syntactic use of $\frac{d}{dx}$ in a formal expression involves a kind of binding of the variable $x$.

Consider the issue of collision of bound variables in first order logic: if $\varphi(x)$ is the assertion that $x$ is not maximal with respect to $\lt$, expressed by $\exists y\ x\lt y$, then $\varphi(y)$, the assertion that $y$ is not maximal, is not correctly described as the assertion $\exists y\ y\lt y$, which is what would be obtained by simply replacing the occurrence of $x$ in $\varphi(x)$ with the symbol $y$. For the intended meaning, we cannot simply syntactically replace the occurrence of $x$ with the symbol $y$, if that occurrence of $x$ falls under the scope of a quantifier.

Similarly, although the functions $x\mapsto x^3$ and $t\mapsto t^3$ are equal as functions of a real variable, we cannot simply syntactically substitute the expression $x^3$ for $t^3$ in $\frac{d}{dt}t^3$ to get $\frac{d}{dt}x^3$. One might even take the latter as a kind of ill-formed expression, without further explanation of how $x^3$ is to be taken as a function of $t$.

So the expression $\frac{d}{dx}$ causes a binding of the variable $x$, much like a quantifier might, and this prevents free substitution in just the way that collision does. But the case here is not quite the same as the way $x$ is a bound variable in $\int_0^1 x^3\ dx$, since $x$ remains free in $\frac{d}{dx}x^3$, but we would say that $\int_0^1 x^3\ dx$ has the same meaning as $\int_0^1 y^3\ dy$.

Of course, the issue evaporates if one uses a notation, such as the $\lambda$-calculus, which insists that one be completely explicit about which syntactic variables are to be regarded as the independent variables of a functional term, as in $\lambda x.x^3$, which means the function of the variable $x$ with value $x^3$. And this is how I take several of the other answers to the question, namely, that the use of the operator $\frac{d}{dx}$ indicates that one has previously indicated which of the arguments of the given function is to be regarded as $x$, and it is with respect to this argument that one is differentiating. In practice, this is almost always clear without much remark. For example, our use of $\frac{\partial}{\partial x}$ and $\frac{\partial}{\partial y}$ seems to manage very well in complex situations, sometimes with dozens of variables running around, without adopting the onerous formalism of the $\lambda$-calculus, even if that formalism is what these solutions are essentially really about.

Meanwhile, it is easy to make examples where one must be very specific about which variables are the independent variable and which are not, as Todd mentions in his comment to David's answer. For example, cases like

$$\frac{d}{dx}\int_0^x(t^2+x^3)dt\qquad \frac{d}{dt}\int_t^x(t^2+x^3)dt$$

are surely clarified for students by a discussion of the usage of variables in formal expressions and more specifically the issue of bound and free variables.


Not sure why this question is back on the front page, but I just wanted to add that the situation seems to be clarified by temporarily generalising to higher dimensions and to curved spaces, i.e., by taking a differential geometry perspective.

Firstly, a quick reminder of the concept of a dual basis in linear algebra: if one has an $n$-dimensional vector space $V$ (let's say over the reals ${\bf R}$ for sake of discussion), and one has a basis $e^1,\dots,e^n$ of it, then there is a unique dual basis $e_1,\dots,e_n$ of the dual space $V^* = \mathrm{Hom}(V,{\bf R})$, such that $e_i(e^j) = \delta_i^j$ for all $i,j=1,\dots,n$ ($\delta_i^j$ being the Kronecker delta, and where I am trying to choose subscripts and superscripts in accordance with Einstein notation). It is worth pointing out that while each dual basis element $e_i$ is "dual" to its counterpart $e^i$ in the sense that $e_i(e^i) = 1$, $e_i$ is not determined purely by $e^i$ (except in the one-dimensional case $n=1$); one must also know all the other vectors in the basis besides $e^i$ in order to calculate $e_i$.

In a similar spirit, whenever one has an $n$-dimensional smooth manifold $M$, and (locally) one has $n$ smooth coordinate functions $x^1,\dots,x^n: M \to {\bf R}$ on this manifold, whose differentials $dx^1,\dots,dx^n$ form a basis of the cotangent space at every point $p$ of the manifold $M$, then (locally at least) there is a unique "dual basis" of derivations $\partial_1,\dots,\partial_n$ on $C^\infty(M)$ with the property $\partial_i x^j = \delta_i^j$ for $i,j=1,\dots,n$. (By the way, proving this claim is an excellent exercise for someone who really wants to understand the modern foundations of differential geometry.)

Now, traditionally, the derivation $\partial_i$ is instead denoted $\frac{\partial}{\partial x^i}$. But the notation is a bit misleading as it suggests that $\frac{\partial}{\partial x^i}$ only depends on the $i^{th}$ coordinate function $x^i$, when in fact it depends on the entire basis $x^1,\dots,x^n$ of coordinate functions. One can fix this by using more complicated notation, e.g., $\frac{\partial}{\partial x^i}|_{x^1,\dots,x^{i-1},x^{i+1},\dots,x^n}$, which informally means "differentiate with respect to $x^i$ while holding the other coordinates $x^1,\dots,x^{i-1},\dots,x^{i+1},\dots,x^n$ fixed". One sees this sort of notation for instance in thermodynamics. Of course, things are much simpler in the one-dimensional setting $n=1$; here, any coordinate function $x$ (with differential $dx$ nowhere vanishing) gives rise to a unique derivation $\frac{d}{dx}$ such that $\frac{d}{dx} x = 1$.

With this perspective, we can finally answer the original question. The symbol $x$ refers to a coordinate function $x: M \to {\bf R}$ on the one-dimensional domain $M$ that one is working on. Usually, one "simplifies" things by identifying $M$ with ${\bf R}$ (or maybe a subset thereof, such as an interval $[a,b]$) and setting $x$ to be the identity function $x(p) = p$, but here we will adopt instead a more differential geometric perspective and refuse to make this identification. The inputs to $\frac{d}{dx}$ are smooth (or at least differentiable) functions $f$ on the one-dimensional domain $M$. Again, one usually "simplifies" things by thinking of $f$ as functions of the coordinate function $x$, but really they are functions of the position variable $p$; this distinction between $x$ and $p$ is usually obscured due to the above-mentioned "simplification" $x(p)=p$, which is convenient for calculation but causes conceptual confusion by conflating the map with the territory.

Thus, for instance, the identity $$ \frac{d}{dx} x^2 = 2x$$ should actually be interpreted as $$ \frac{d}{dx} (p \mapsto x(p)^2) = (p \mapsto 2x(p)),$$ where $p \mapsto x(p)^2$ denotes the function that takes the position variable $p$ to the quantity $x(p)^2$, and similarly for $p \mapsto 2x(p)$.

If one also had another coordinate $t: M \to {\bf R}$ on the same domain $M$, then one would have another differential $\frac{d}{dt}$ on $M$, which is related to the original differential $\frac{d}{dx}$ by the usual chain rule $$ \frac{d}{dt} f = \left(\frac{d}{dt} x\right) \left(\frac{d}{dx} f\right).$$ Again, for conceptual clarity, $t, x, f: M \to {\bf R}$ should all be viewed here as functions of a position variable $p \in M$, rather than being viewed as functions of each other.


The accepted answer is good in that it draws attention to the subtleties involved, but as far as I can tell it doesn't really settle the matter.

Joel is careful to speak of a kind of binding of $x$ by $\frac{d}{dx}$, but at the same time he mentions that $x$ remains free in $\frac{d}{dx}x^3$. So is it free or bound?

It cannot be bound in the traditional sense (and Joel says that), otherwise we'd be allowed to rename bound variables ($\alpha$-convert) and write $$ \frac{d}{dx}x^2 = \frac{d}{dt}t^2, $$ which everyone since Leibniz would simplify to $$ 2x=2t. $$ It's probably a bad idea to have a mechanism wich allows us to conlcude that any two free variables are equal.

On the other hand $x$ cannot be free in the traditional sense, since if we substitute say $5$ for $x$ we'd get $$ \frac{d}{d5}5^2. $$ Most people would consider this meaningless. Even if we don't consider it meaningless, I fail to see how one could arrive from there to the expected result of $10$. (Certainly if you allow substituting $5$ for $x$ in $\frac{d}{dx}x^2$ you would also allow substituting $25$ for $5^2$ in $\frac{d}{d5}5^2$ to rewrite it as $\frac{d}{d5}25.$ But the same expression results if we substitute $5$ for $x$ in $\frac{d}{dx}(20+x)$, with the expected result now being 1.)

So we conclude that $x$ it is neither bound nor free in $\frac{d}{dx}x^2$. But which kind of binding is it then?

From a modern perspective it's tempting to say that $\frac{d}{dx}x^2$ is 'syntactic sugar' for $(\lambda x.x^2)' (x)$, where $f'$ denotes the derivative of a map $f:\mathbb{R}\to \mathbb{R}$ and $\lambda x.x^2$ is lambda calculus notation for the map $x\mapsto x^2$. But the expression $(\lambda x.x^2)' (x)$ has both a free $x$ (in the second parenthesis) and a bound $x$ (inside the $\lambda x.x^2$), while it's not clear which $x$ in $\frac{d x^2}{dx}$ is free/bound. So if we really want to interpret $\frac{d x^2}{dx}$ as syntactic sugar for $(\lambda x.x^2)' (x)$, there seems to be a proof missing that this notation is correct (which reminds me of Mike Shulman's question). We might also conclude what Andrej Bauer suggested elsewhere, that maybe $\frac{d f(x)}{dx}$ is broken notation that we should stop teaching.

Instead I'll argue that there is a consistent way of making sense of the notation $\frac{dy}{dx}$. It was already suggested in your question: interpret $\frac{d}{dx}$ as acting on "functions of $x$". You rightly asks what functions of $x$ are. Here's one way to answer that: interpret the variables $x$, $y$ of calculus as differentiable maps from a manifold $M$ (the state space) to $\mathbb{R}$. Call one such variable $y$ a function of $x$, if there exists $f:\mathbb{R}\to\mathbb{R}$ such that $y=f\circ x$. One can easily prove that if $y$ is a function of $x$ in this sense, then there is a unique $z:M\to \mathbb{R}$ such that $dy=z\cdot dx$ where $dx,dy$ are differential forms in the sense of modern differential geometry. (Indeed $z=f'\circ x$ and used to be called the differential coefficient of $dy$ wrt $dx$). Denote this unique $z$ with $\frac{dy}{dx}$.

You might not be very happy with the manifold $M$ appearing here, since it never appeared explicitly in the old calculus. I am not very happy with it either, which is why I asked this question, and only found now that you had already asked a very similar question several years earlier. (The answers you received there unfortunately don't satisfy me.)