What exactly is the difference between a derivative and a total derivative?

The key difference is that when you take a partial derivative, you operate under a sort of assumption that you hold one variable fixed while the other changes. When computing a total derivative, you allow changes in one variable to affect the other.

So, for instance, if you have $f(x,y) = 2x+3y$, then when you compute the partial derivative $\frac{\partial f}{\partial x}$, you temporarily assume $y$ constant and treat it as such, yielding $\frac{\partial f}{\partial x} = 2 + \frac{\partial (3y)}{\partial x} = 2 + 0 = 2$.

However, if $x=x(r,\theta)$ and $y=y(r,\theta)$, then the assumption that $y$ stays constant when $x$ changes is no longer valid. Since $x = x(r,\theta)$, then if $x$ changes, this implies that at least one of $r$ or $\theta$ change. And if $r$ or $\theta$ change, then $y$ changes. And if $y$ changes, then obviously it has some sort of effect on the derivative and we can no longer assume it to be equal to zero.

In your example, you are given $f(x,y) = x^2+y^2$, but what you really have is the following:

$f(x,y) = f(x(r,\theta),y(r,\theta))$.

So if you compute $\frac{\partial f}{\partial x}$, you cannot assume that the change in $x$ computed in this derivative has no effect on a change in $y$.

What you need to compute instead is $\frac{\rm{d} f}{\rm{d}\theta}$ and $\frac{\rm{d} f}{\rm{d} r}$, the first of which can be computed as:

$\frac{\rm{d} f}{\rm{d}\theta} = \frac{\partial f}{\partial \theta} + \frac{\partial f}{\partial x}\frac{\rm{d} x}{\rm{d} \theta} + \frac{\partial f}{\partial y}\frac{\rm{d} y}{\rm{d} \theta}$


I know this answer is incredibly delayed; but just to summarise the last post:

If I gave you the function

$$ f(x,y) = \sin(x)+3y^2$$

and asked you for the partial derivative with respect to $x$, you should write:

$$ \frac{\partial f(x,y)}{\partial x} = \cos(x)+0$$

since $y$ is effectively a constant with respect to $x$. In other words, substituting a value for $y$ has no effect on $x$. However, if I asked you for the total derivative with respect to $x$, you should write:

$$\frac{df(x,y)}{dx}=\cos(x)\cdot {dx\over dx} + 6y\cdot {dy\over dx}$$

Of course I've utilized the chain rule in the bottom case. You wouldn't write $dx\over dx$ in practice since it's just $1$, but you need to realise that it is there :)


Does everyone agree that the poster arrived at the correct answer?

People write $$\frac{\partial}{\partial t}g(x(t),t)$$ or $$\frac{\text{d}}{\text{d} t}g(x(t),t)$$

The first is typically used to mean "the derivative of function $g$ with respect to the second argument". The second usually means the "total derivative". There are variations on this. Some people omit the arguments and just write, for example, $\frac{\partial}{\partial t}g$

So for example: if $x$ is secretly a function of $t$, then the notation $\frac{d}{dt}f(x,t)$ is called the total derivative and is an abbreviation for the (single-variable derivative) $g′(t)$ where $g(t)=f(x(t),t)$. In applying the chain rule to the last expression, you would need some way to denote "the derivative of f with respect to its first argument" many people would write $\frac{\partial}{\partial x}f$ for this, but in many cases this is confusing as I explain in the example below.

The wide-spread math notation here confuses many people and I think it is pretty much unnecessary to use it. If you want to take a total derivative, construct explicitly the function (like $g$ above) and take a single-variable derivative. Otherwise, the explanations for the difference between total and partial derivatives needs you to make appeals like temporarily fixing variables or saying that a variable is effectively constant or switching between thinking of $x$ as a function and as an expression. These are all fuzzy things you can do successfully once you already feel comfortable with what's going on. But otherwise, it pays to think carefully about what's really happening.

Your example

The problem stems from the conflation of an expression and a function. You did this when you wrote $w = f(x,y) = x^2 + y^2$. In that case, many will write

$\frac{\partial}{\partial x}w$ and

$\frac{\partial}{\partial x} f(x,y)$

(which are equivalent). This sort of makes sense. In both cases, the thing to the right of the differential operator is an expression which contains $x$ and $y$. The thing that is produced by applying that operator is also an expression in the same variables. This is also true of what $\frac{d}{dx}$ means. For the particular expressions above, I would just use that.

The actual purpose of the partial derivative is to take derivatives of functions with respect to one of its arguments, not expressions. That's not what's happening above. That is what's happening when people write:

$\frac{\partial}{\partial x} f$.

$f$ is not an expression. It is a function. I personally do not like this notation. You could have defined an identical $f$ by writing $f(a,b) = a^2 + b^2$. The variables that appear in the definition of a function are, in the strictest sense, invisible to the rest of the world. It's just a convenient way of stating "$f$ is a function that takes two arguments. It squares the first, squares the second, and returns the sum of the squares". Instead of having to write that sentence out (which people had to do before inventing better notation), you can instead give names to the arguments of $f$ so that you can easily refer to them when defining $f$.

But when you write $\frac{\partial}{\partial x} f$, then you are using some knowledge of how you defined $f$---that you chose the name $x$ for the first argument. It can be useful to have names for function arguments instead of just referring to their position (first, second, etc. argument), and so that's why the partial notation survives, but I think the notation needs to improve for this.

What someone typically means when they write $\frac{\partial}{\partial x} f$ is roughly "the function that takes two arguments and returns the sensitivity of $f$ with respect to its first argument". So if you're at some point $(a,b)$ or $(x,y)$ or whatever, and you wiggle the first argument $a$ or $x$, how much does the output of $f$ wiggle? That is the question that the gradient of a function is supposed to answer. This is probably what someone means if they say "normal derivative" They are thinking about only a single function, with possibly multiple arguments. And they are trying to make an object that tells you how sensitive the output of the function is to a change in each of the inputs.

The total derivative usually means that somewhere you've implicitly defined some new functions. In this case, you have made functions $x(r,\theta) = r \sin(\theta)$ and $y(r,\theta) = r \cos(\theta)$, and you can compose these functions, making a new function: $$g(r,\theta) = f(x(r,\theta),y(r,\theta))$$

Notice again, that $r$ and $\theta$ are chosen only to give a human information about connotation of this function. If we processed things purely symbolically, then the definition of $g$ could as well have been

$$g(input_1,input_2) = f(x(input_1,input_2),y(input_1,input_2))$$

And so when the problem asked you to find $\frac{\partial}{\partial r} w$, there are two, in the end identical, interpretation of what that means. Either construct the function $g$ as I did above, and report its sensitivity with respect to the first argument. OR substitute the expressions for $x$ and $y$ into the expression for $w$. Now you have an expression for $w$ in terms of $r$ and $\theta$. I prefer the approach that thinks about functions. This is how we organize code and I think this is how we should organize math. When you deal with expressions, you effectively have a ton of global variables.

So how do we compute $\partial_1 g$, which is just the notation for "make a function with the same arity (number of inputs) as $g$, such that it evaluates the the derivative of the function $g$ with respect to its first argument"? It's just the chain rule.

$$[\partial_1 g](r,\theta) = [\partial_1 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 x](r,\theta) + [\partial_2 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 y](r,\theta)$$

We can see why thinking about things in this way is not popular! But this is the clearest, most mechanical, way to think about it. Otherwise you are relying on implicit punning of $x$ as a function and as an expression. Choose one and stick with it!

Anyway, to simplify the above definition, which didn't care about the definitions of $f$, $x$, or $y$, we need to use the definitions.

$f(x,y) = x^2 + y^2$ and therefore

  • $[\partial_1 f](x,y) = 2x$
  • $[\partial_2 f](x,y) = 2y$

$x(r,\theta) = r\sin(\theta)$ and therefore

  • $[\partial_1 x](r,\theta) = \sin(\theta)$

likewise

  • $[\partial_1 y](r,\theta) = \cos(\theta)$

FURTHERMORE, though we don't need it at the moment

  • $[\partial_2 x](r,\theta) = r\cdot \cos(\theta)$
  • $[\partial_2 y](r,\theta) = -r\cdot \sin(\theta)$

So again, the function is

$$[\partial_1 g](r,\theta) = [\partial_1 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 x](r,\theta) + [\partial_2 f](x(r,\theta), y(r,\theta)) \cdot [\partial_1 y](r,\theta)$$

substituting the functions we just computed:

$$[\partial_1 g](r,\theta) = 2x(r,\theta) \cdot \sin(\theta) + 2y(r,\theta) \cdot \cos(\theta)$$

and substituting $x$ and $y$

$$[\partial_1 g](r,\theta) = 2r\sin(\theta) \cdot \sin(\theta) + 2r\cos(\theta) \cdot \cos(\theta)$$

which, after using the very trig identity you used, is

$$[\partial_1 g](r,\theta) = 2r$$

Yet another way to make the same point:

When you see the notation $g'(x)$, you can group that as $[g'](x)$. You've made a new function, called "g prime", which is the derivative of $g$, and you're evaluating it at point $x$. $g'(y)$ means the same thing, except you're evaluating at the point $y$. The multidimensional analogue of this is $\nabla g(\mathbf{x})$. You should parse that as $[\nabla g](\mathbf{x})$.

This is not the case with the notation $\frac{d}{dx} g(x)$. If you parse that as $[\frac{d}{dx} g](x)$, you get confused because what does $x$ mean in the scope of the brackets? You don't have to give meaning to it because it should be meaningless. The operator $\frac{d}{dx}$ applies to an expression, not a function.

But, what people will routinely do is define

$g(x)= x^2+sin(x)+\text{whatever expression in }x$

and then write $\frac{d}{dx} g(y)$ when they really should have written $g'(y)$. They don't do this very often in the single-variable case, but they do it in the multi-variable case. I just showed the single-variable case because it's clearer to see the problem with it.

My inspiration for this answer comes from http://groups.csail.mit.edu/mac/users/gjs/6946/sicm-html/book-Z-H-78.html#%_sec_Temp_453)