Do generalized coordinates have to be orthogonal?

There is not even a way to define whether such coordinates are orthogonal. There is no natural way to define an inner product on the space of tangent vectors in the coordinate space. Even if we did have such a definition, it would not in general be possible to define the coordinates in such a way that they were orthogonal everywhere. For example, you can't do this for a particle moving on the surface of a sphere.


Generalized coordinates have to be related to base coordinates in the following way: the map has to be invertible. Preferably this means a bijection, that is every point in the original coordinate system maps to exactly one point in the new system, and vice-versa, but one-to-one in the relevant subsets of the two is sufficient. The map also has to be differentiable ("smooth"), to some level that depends on the Lagrangian.

That's it. If you satisfy invertibility and smoothness, anything goes.

Handling external forces when you change coordinate systems is a slightly different matter. The problem you're running into is that you have to add the external forcing term to the Lagrangian before you do the change of coordinates. So then your Lagrangian looks like: \begin{align} L &= L_{\mathrm{free}} - \theta_2 \tau_2(t) \end{align} (check the sign of the $\theta_2 \tau_2$ term).

Now, when you change coordinates you get: \begin{align} L &= L_{\mathrm{free}} - (q_2 - q_1) \tau_2(t), \end{align} and everything proceeds as before.

This seems strange to me because now it says that $-\tau_2$ is being applied to the first coordinate $\theta_1$ even though the external torque is only on $\theta_2$.

That's because you're trying to think with a partial reversal of the coordinate system. $q_1$ is not equivalent to $\theta_1$, by itself. You have to keep in mind the context of both: a, the rest of the transformation, and b, the rest of the Lagrangian.

The inverse transformation is: \begin{align} \left[\begin{array}{c} \theta_1 \\ \theta_2\end{array}\right] & = \left[\begin{array}{c} q_1 \\ q_2 - q_1\end{array}\right]. \end{align} Notice that $q_1$ feeds in to both of the thetas.

Next, examine your equations of motion. I'm not going to do the derivation for you, but I'd be surprised to hear you end up with clean separate $\ddot{q}_i = \ldots$.

Edit: after thinking some more, I have an intuitive explanation. $q_1$ and $\theta_1$ are numerically equal, but they mean different things. In the $\theta$ system the angles describe the position of the arms independently with respect to some external standard direction, and so describe the motion of the two bodies independently (specifically, the kinetic energy–their coupling is purely in the potential term). In the $q$ system, $q_1$ describes the position of the double pendulum using the first arm as if the whole double pendulum were rigid, and $q_2$ defines the angle the outer pendulum arm with respect to the inner arm. Since that gives you a moving reference for $q_2$, it couples the kinetic terms (i.e. if your initial kinetic energy was $T=\frac{m_1}{2}\dot{\theta}_1^2 + \frac{m_2}{2}\dot{\theta}_2^2$ your new one is $T=\frac{m_1}{2}\dot{q}_1^2 + \frac{m_2}{2}\left(\dot{q}_2 - \dot{q}_1\right)^2$). Also, since $q_1$ covers the whole pendulum, the external torque acts on it, too.

Why is invertibility important? The map between your coordinates needs to be invertible for two reasons. First, once you've worked out the dynamics in the new coordinate system, you might want to translate that back into the old one.

Second, bad things happen in the math if the system runs across a point where they aren't invertible. Consider the point mass undergoing uniform motion on the $x$-axis. The motion is nice and simple: $x=vt$, and $y=0$. Now, change to polar coordinates. You'll get $r = |vt|$ and $\theta=-\pi\Theta(-vt)$, with $\Theta(a) \equiv 0$ if $a < 0$ and $1$ if $a > 0$. Notice how something violent happens in the coordinates at the origin, precisely where the map from $(r,\theta)$ to $(x,y)$ becomes many to 1 (i.e. at $r=0$ you're at the origin, no matter what $\theta$ is).

Side note: the polar–Euclidean transformation is an example of a coordinate transform that is not a bijection. An infinite number of $\theta$ values map to single $(x,y)$ pairs. This can cause problems with interpretability if the fact is forgotten, but doesn't cause any mathematical problems I can think of at this moment.

Solving the differential equations in the presence of a point-like problem in the coordinates is possible, it just requires more advanced tools. It's better to make sure any point-like problems are irrelevant, if possible. For an example of a real world problem of this type with a mechanical computer, see the phenomenon of gimbal lock.

Why is smoothness important? This one is easier to explain. It all comes down to two words: chain rule. First, your Lagrangian is going to have a kinetic terms expressed in the original coordinate system. To find your new kinetic terms, you need to apply the chain rule to the transformation. That is, if \begin{align} x_i & = f_i (q_1...) \Rightarrow \\ \dot{x}_i & = \sum_{j} \frac{\partial f}{\partial q_j} \dot{q}_j. \end{align}

The second reason to desire differentiability of the transformation is technical: it makes proving the Euler-Lagrange equations are equivalent more straightforward (again, using the chain rule).

When you have both interpretability and smoothness that is sufficient to prove that the Euler-Lagrange equations of motion obtained are equivalent using the chain rule.