What's my confusion with the chain rule? (Differentiating $x^x$)

Both methods are wrong, but the fix is easy: the solution is the sum of the two proposals, and this is not by coincidence !

Naturally, turning a single instance of $x$ to a constant cannot be the way as that is not symmetric. The correct way is by differentiating on every instance in turn, and is justified by the chain rule with partial derivatives:

$$\frac{df(u,v)}{dx}=\frac{\partial f(u,v)}{\partial u}\frac{du}{dx}+\frac{\partial f(u,v)}{\partial v}\frac{dv}{dx}.$$ In other words, you keep one instance variable while the other remains constant and sum the two cases.

Here, $f(u,v)=u^v$ with $u=v=x$, and

$$\frac{dx^x}{dx}=\frac{du^v}{dx}=vu^{v-1}\cdot1+\ln(u)u^v\cdot1=x^x+\ln(x)x^x,$$ or with a more intuitive notation$$\frac{dx^x}{dx}=\frac{dx^v}{dx}\cdot1+\frac{du^x}{dx}\cdot1=vx^{v-1}+\ln(u)u^x=x^x+\ln(x)x^x.$$


This works with as many instances of $x$ as you like. For instance $x^{x+x^2}$ seen as $u^{v+w^2}$ yields

  • varying the first instance, $(v+w^2)x^{v+w^2-1}$;

  • varying the second instance, $\ln(u)u^{x+w^2}$;

  • varying the third instance, $\ln(u)u^{v+x^2}2x$.

Then globally

$$(1+x+\ln(x)(1+2x))e^{x+x^2}.$$


If you work with the formal definition of the chain rule, you'll see how what you're trying to do makes no sense.

But if you want to stick with the abuse of notation $\frac{dz}{dx}=\frac{dz}{dy}\frac{dy}{dx}$, I'd say that the heart of the problem is in your claim that $\frac{d(x^u)}{du}=x^u\log x$. This is only valid if $x$ is constant, and doesn't apply if $x$ is a function of $u$ (in our case, $x=u$).

That's the difference between a total derivative $\frac{d}{dt}$ and a partial derivative $\frac{\partial}{\partial t}$. The latter, $\frac{\partial f(s,t)}{\partial s}$, means, "change in $f$ when $s$ changes and nothing else does". Whereas $\frac{df(s,t)}{ds}$ means "change in $f$ when $s$ changes, and everything else changes accordingly". So you can't have $u$ depend on $x$ and calculate a total derivative in a way that assumes $x$ is constant.


Both are wrong, since in spite of choosing $u=x$, you are replacing only one variable $x$ by $u$ while leaving the other $x$ intact. And then again, you decide to differentiate with respect to $u$ by chain rule, initially treating $x$ as a constant in $x^u$ and in $u^x$, which again is wrong.

What you should do is:

Write $x^x$ as $e^{\ln x^x}=e^{x\ln x}$ and then you should differentiate with respect to $x$ using chain rule.