Rigorous proof that $dx dy=r\ dr\ d\theta$

In the geometric approach, $dr^2=0$ as it is not only small but also symmetric (see here).

In the algebraic, more rigorous approach, you are deriving $x$ by $\theta$ and $y$ by $r$, but you are forgetting the cross terms! These are function of two variables each, so you should do all partial derivatives:

$$ dx=\dfrac{\partial x}{\partial r}dr+\dfrac{\partial x}{\partial \theta}d\theta $$ $$ dy=\dfrac{\partial y}{\partial r}dr+\dfrac{\partial y}{\partial \theta}d\theta\;. $$

Deriving (you did already two of them, here we do all four): $$ dx=\sin\theta\,dr+r\,\cos\theta\,d\theta $$ $$ dy=\cos\theta\,dr-r\,\sin\theta\,d\theta\;. $$ The exterior product of these two vectors (see link above, or think of the cross product, or think of the area of the parallelogram) is: $$ dx \, dy = (\sin\theta\,dr)(-r\,\sin\theta\,d\theta)-(\cos\theta\,dr)(\cos\theta\,d\theta) $$ $$ = -r(\sin^2\theta +\cos^2\theta)\,dr\,d\theta=-r\,dr\,d\theta. $$ Usually one takes instead $x=\cos\theta,y=\sin\theta$, so that the minus sign becomes a plus. But anyway, what matters is the absolute value of that thing.

I think that what you were missing is the link between area (and more generally volumes) and exterior product. Look it up starting from the link above, it's very interesting.


In fact, what you did wrong is in confusing partial and total differentials (copy below) :

enter image description here

$$dx = \sin{\theta}dr+r\cos{\theta} d\theta$$ $$ dy = \cos\theta dr- r\sin{\theta} d\theta $$

One have to compute the determinant of the Jacobian matrix :

$dxdy= \begin{Vmatrix} \frac{\partial x}{\partial r} & \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r} & \frac{\partial y}{\partial \theta} \\ \end{Vmatrix}dr\;d\theta = \begin{Vmatrix} \sin(\theta) & r \cos(\theta) \\ \cos(\theta) & -r \sin(\theta) \\ \end{Vmatrix}dr\;d\theta = r\left|-\sin^2(\theta)-\cos^2(\theta)\right|\,dr\,d\theta $

$$dx\,dy= r\,dr\,d\theta$$


How do you prove this rigorously?

The equality $dx\;dy= r\;dr\;d\theta$ by itself is nonsense (here, we don't have ordinary products in each side of the equality). What we should prove is $$\iint_Qf(x,y)\;dx\;dy=\iint_A f(r\cos \theta,r\sin \theta)\;r\;dr\;d\theta\tag{$*$}.$$

I will not give a rigorous proof, but I will tell you how a formal proof can be done according to the approach of C. H. Edwards in his book Advanced calculus of several variables. Quoting this book:

The student has undoubtedly seen change of variables formulas such as $(*)$ which result from changes from rectangular coordinates to polar coordinates. The appearance of the factor $r$ in the formula is sometimes "explained" by mythical pictures, such as figure in the OP's post, in which it is alleged that $dA$ is an "infinitesimal" rectangle with sides $dr$ and $r\;d\theta$, and therefore has area $r \;dr\; d\theta$. In this section we shall give a mathemtaically acceptable explanation of the origin of such factors in the transformation of multiple integrals from one coordinate system to another.

To start, let us "remember" three things about double integrals:

$\quad$ (i) From the geometric interpretation of the double integral, if $X$ is a very small region and $(x_0,y_0)$ is some point in $X$, then $$\iint_Xf(x,y)\;dx\;dy\cong f(x_0,y_0) \Delta X,$$ where $\Delta X$ is the area of $X$.

$\quad$ (ii) The double integral of a function $g$ over a rectangle $A\subset\mathbb{R}^2 $ can be written as a limit of Riemann sums: $$\iint_Ag(x,y)\;dx\;dy=\lim_{\mathcal{\|\mathcal{P}\|}\to0}\sum_{i=1}^k g(x_i,y_i)\Delta A_i,$$ where $\mathcal{P}=\{A_1,...,A_k\}$ is a partition of $A$, $(x_i,y_i)$ is a point in $A_i$ and $\Delta A_i$ is the area of $A_i$. Thus, for a sufficiently fine partition, $$\sum_{i=1}^k g(x_i,y_i)\Delta A_i\cong\iint_Xg(x,y)\;dx\;dy.$$

$\quad$ (iii) In the integral notation, $(x,y)$ is a dummy variable and thus $$\iint g(x,y)\;dx\;dy=\iint g(r,\theta)\;dr\;d\theta.$$

Assume that we want to calculate the double integral $$\iint_Q f(x,y)\;dx\;dy,\tag{0}$$ where $Q$ is an annular sector in $\mathbb{R}^2$ (see picture below).

enter image description here

We want to "transform" this integral into an integral over a rectangle. Why? Because we expect that the "transformed integral" will be easier to compute (by virtue of Fubini's theorem).

Note that the rectangle $A=\{(r,\theta)\in\mathbb{R}^2\mid a\leq r\leq b\text{ and }\alpha\leq \theta\leq\beta \}$ is transformed into the annular sector $Q$ by the mapping $T:\mathbb{R}^2\to\mathbb{R}^2$ given by $$T(r,\theta)=(r\cos\theta,r\sin\theta).\tag{1}$$

enter image description here

So, $$Q=T(A).\tag{2}$$

We shall see that the mapping $T$ can be used to transform $(0)$ into an integral over the rectangle $A$, as we want. For this, we have to answer the

Big question: Given a rectangle $R$, what is the relationship between the area $\Delta R$ of $R$ and the area $\Delta T(R)$ of $T(R)$?

Answer: Look at the picture above. If $(c,d)$ is the center-point of $A$, then $$\text{area of } Q=\frac{b^2(\beta-\alpha)}{2}-\frac{a^2(\beta-\alpha)}{2}=\frac{1}{2}(b+a)(b-a)(\beta-\alpha)=c(\text{area of } A)$$ The same is true for $R$ and thus $$\Delta T(R)=p\Delta R,\tag{3}$$ where $p$ is the abscissa of the center-point of $R$.

Now, let $\mathcal{P}=\{A_1,..., A_k\}$ be a partition of $A$ into very small subintervals. For each $i=1,...,k$, let $(x_i,y_i)$ be the center-point of $A_i$. Then, $T(x_i,y_i)$ is a point in $T(A_i)$ and $$Q=T(A_1)\cup\cdots \cup T(A_k).\tag{4}$$ with disjoint union. It follows that \begin{align} \iint_Qf(x,y)\;dx\;dy&=\iint_{T(A)} f(x,y)\;dx\;dy\tag{by (2)}\\\\ &=\sum_{i=1}^k\iint_{T(A_i)} f(x,y)\;dx\;dy\tag{by (4)}\\\\ &\cong\sum_{i=1}^k f(T(x_i,y_i))\Delta T(A_i)\tag{by (i)}\\\\ &=\sum_{i=1}^kf(T(x_i,y_i))x_i\Delta A_i\tag{by (3)}\\\\ &\cong \iint_A (f(T(x,y))\;x\;dx\;dy\tag{by (ii)}\\\\ &=\iint_A f(x\cos y,x\sin y)\;x\;dx\;dy\tag{by (1)}\\\\ &=\iint_A f(r\cos \theta,r\sin \theta)\;r\;dr\;d\theta\tag{by (iii)} \end{align}

By now, working with one sufficiently fine partition, we have deduced that $$\iint_Qf(x,y)\;dx\;dy\cong\iint_A f(r\cos \theta,r\sin \theta)\;r\;dr\;d\theta.$$ However, with an "epsilon argument" it is possible to prove that we can make this approximation as good as we want and thus the equality $(*)$ is indeed true (the details are made in the book). Quoting the said book again, the proof of $(*)$

is a matter of supplying enough epsilonics to convert the above discussion into a precise argument.