Understanding the differential $dx$ when doing $u$-substitution

Okay, so this is a slightly tricksy problem, because at first people think "Oh, it's fine, I just multiply through by this thing and it's fine" but then they think "Hang on, we've never actually defined what we meant by this... it's just some shorthand trickery" - but then if they eventually do differential geometry you think "Aha! So it did make sense all along!"

The main message I want you to take away is that things like $\mathrm d x$ are actually well defined things called differential forms which you don't really need to understand in any detail at all to get how they work.

The way they end up working in integration and changes of variable is roughly the following: they come together to make a volume form which just tells you how much volume a small range of your parameters corresponds to. (I say "come together" because if you are doing many integrations like in $\int \int f(x,y) \;\mathrm d x \mathrm d y$ then you get a bunch of the $\mathrm d [...]$ things all together.) More precisely, remember how you can roughly define integration as a limit of a sum like $$\int_a^b f(x) \mathrm d x \equiv \lim_{N\to\infty}\sum_{n=1}^N f(x_n) \left(x_n-x_{n-1}\right)$$ where $x_0=a,x_N=b$ and the other points $x_i$ are chosen in between, such that all gaps $\delta_n = x_n-x_{n-1} \to 0$ (say, uniformly) as $N\to \infty$. Here, $\delta_n = x_n-x_{n-1}$ is providing some measure of how important the bit of space between $x_{n-1},x_n$ is in computing the integral. The $\mathrm d x$ is what keeps track of that information.

Suppose you then try $u=x^2$ or $x=\sqrt u$. Then in general $$\int_{a^2}^{b^2} f(\sqrt u) \mathrm d u \equiv \lim_{N\to\infty}\sum_N f(\sqrt{u_n}) \left(u_n-u_{n-1}\right) = \lim_{N\to\infty}\sum_N f(x_n) \left(x_n^2-x_{n-1}^2\right)\neq \int f \mathrm d x$$ because the weight is different!

But notice that $x_n^2-x_{n-1}^2 = (x_n-x_{n-1})(x_n+x_{n-1}) \approx (x_n-x_{n-1})(2x_n)$ in the limit of fine spacing, so $$\int_{a^2}^{b^2} f(\sqrt u) \frac{\mathrm d u}{2x} = \int_a^b f(x) \; \mathrm d x$$

We're really analyzing the difference between the volumes of the little patches of space when we play with the differentials. The trick is to realize that in general, just like here, $\mathrm d u = u'(x) \mathrm dx$. In higher dimensional integrals, you will discover that the generalization to e.g. $$\int f(x,y) \;\mathrm d x\mathrm d y = \int f(u,v) \; J \; \mathrm d u\mathrm d v$$ where $u=u(x,y),v=v(x,y)$ involves a quantity $J$ called the Jacobian (determinant) which uses all the possible derivatives of $u,v$ with respect to $x,y$ in a particular way.

The notation $$\frac{\mathrm d u}{\mathrm d x} = \lim_\text{fine spacing}\frac{\delta u}{\delta x} = \lim_{x_n-x_{n-1}\to 0}\frac{u_n-u_{n-1}}{x_n-x_{n-1}} = u'(x)$$ is now seen to be just a suggestive notation which works for the case of only one variable changing. It's used because it makes it clear how the volume form should be replaced.


When there are many variables, this notation breaks down because the factors are all mixed up together and people write partial derivatives, which you'll see soon if you haven't already, instead. It turns out that it makes sense to use a generalization of the $$\mathrm d u = u'(x) \mathrm d x$$ law called the chain rule in which, for $u=u(x,y)$ for example $$\mathrm d u = u_x \mathrm d x + u_y \mathrm d y$$ where $u_x(x,y)$ is the derivative of $u$ with respect to $x$ when we just think of $y$ as a constant.

You'll have to wait until differential geometry courses to see how to use this to get the Jacobian factor; it turns out that rather than just writing the forms together, you should technically define something called a wedge product such that $a\wedge b = -b\wedge a$ for one-forms like $\mathrm d x$; then you get $$\mathrm d u \wedge \mathrm d v = (u_x \mathrm d x + u_y \mathrm d y)\wedge(v_x \mathrm d x + v_y \mathrm d y) = (u_x v_y-u_y v_x) \mathrm d x \wedge \mathrm d y$$ so that the Jacobian is (one over) $(u_x v_y-u_y v_x)=\det \pmatrix{u_x & u_y \\ v_x & v_y}$.

You can get this result directly from thinking about little patches of volume, however, so you'll see this far earlier than any differential form stuff. I just thought that, since you were curious, you should have had the full story mentioned to you along the way.


The use of infinitesimals can only be formalized an analysed with care when working with non-standard analysis. When people first study calculus and the most elementary books define $dx$ as $\Delta x$ approaching zero, usually people get confused think "but this should be zero", and in standard analysis it'll really be.

Things are much easier than this, however, we just have to throw away those $dx$ and $dy$. Why? Simply because modern mathematics adhered to methods that are more sophisticated, simpler, and because if you are going to proceed in mathematics you'll really need those modern methods when studying analysis or differential geometry (where $dx$ receives a true definition and gains a fundamental role).

Now you might ask the exact same question I've asked when my I first encountered the rigorous framework: "this guy is mad! My book talks about infinitesimals, my teacher told me it's all right, he must be mad", but it's not like that. I'll show you two examples: the first one is meant to show you that when working with integrals those things appear just as mnemonic rules that allows you to remember the true formula easier when you're starting. The second is to show you when things become confusing when using infinitesimals without care.

First, consider your function:

$$\int2x(x^2+4)^{100}dx$$

The idea is that we see that this can be rewritten in some convenient form. Note that if we set $f(x) = x^2+4$, then $f'(x)=2x$. Then we are integrating:

$$\int f'(x) (f(x))^{100}dx$$

Now, if we set $g(x)= x^{100}$ note that by composition we are integrating:

$$\int g(f(x))f'(x)dx$$

Now, if $G(x)$ is a primitive of $g$ recall the chain rule, the integrand is just $(G(f(x)))'=G'(f(x))f'(x)$, so since the indefinite integral is a primitive, and the integrand is a derivative the result is simply:

$$\int (G(f(x)))'dx = G(f(x))$$

Ande since $g(x) = x^{100}$ the obvious primitive is $G(x) = x^{101}/101$ and hence:

$$\int 2x(x^2+4)^{100}dx = \frac{(x^2+4)^{101}}{101}$$

We can "remember" that by saying that we set $u = x^2+4$ and $du=2xdx$, so is just a rule to remember how to find the formula that's just an application of the chain rule.

The second example is the chain rule itself. Usually people write:

$$\frac{df}{dx} = \frac{df}{du} \frac{du}{dx}$$

But look, on the lhs you are differentiating not $f$, but rather $f\circ u$. So, on the left $f$ means one thing, on the right it means another thing! So using this language of infinitesimals the wrong way arround may lead to confusions and may hide from you the true nature of what you are studying. The book I've used when I moved from the infinitesimals treatment to the rigorous one was Spivak's Calculus. Try it! He develop everything formally, without appeal to those "undefined" creatures and shows you where they appear just as ways to remember formulas.

I hope this helps you. Good luck!


To understand integration by substitution, you can just use the chain rule in reverse: \begin{equation} \int f(g (x)) g'(x) dx = F (g (x)) + C, \end{equation} where $ F $ is an anti derivative of $ f $. To check this, just take the derivative of the right hand side using the chain rule.