The dynamical variables in Lagrangian formalism

We do not treat $\dot q$ as an independent variable in the derivation of the Euler-Lagrange equations. The rough answer is that $q$ and $\dot q$ are independent as inputs to the Lagrangian, but become linked once we specify a path through configuration space - I expand on this in points 5 and 6.

I'll be quite formal in what follows, but perhaps the formality will be somewhat enlightening. First, a few preliminaries:

1: The state of an $N$-dimensional system consists of a point $$q\equiv(q_1,q_2,\ldots,q_N)\in \bf{Q}$$ where $\bf{Q}$ is called the configuration space corresponding to the system, and $q_i$ is the $i^{th}$ generalized coordinate.

2: A curve $\gamma$ through the configuration space is a map $$ \gamma : \mathbb{R} \rightarrow \bf{Q}$$ $$ t \mapsto \gamma(t)=\big(q_1(t),q_2(t),\ldots,q_N(t)\big)\equiv q_\gamma(t)$$ The curve is therefore parameterized by $t$, which we call the time. This describes how the state of the system evolves. Note that we will demand that $\gamma$ be at least twice differentiable.

3: At every point along $\gamma$, there exists a unique tangent vector $V_\gamma(t)$ given as follows: $$ V_\gamma : \mathbb{R} \rightarrow \bf{T_qQ}$$ $$ t \mapsto V_\gamma(t)=\big(\dot q_1(t),\dot q_2(t),\ldots,\dot q_N(t)\big)\equiv \dot q_\gamma(t)$$ $\mathbf{T_qQ}$ is called the tangent space to $\bf{Q}$ at the point $q$. I won't bother defining this rigorously, but the intuitive notion of a tangent space should be familiar if you're picking up Goldstein.

4: The disjoint union of all of the tangent spaces of $\bf{Q}$ is called the tangent bundle to $\bf{Q}$, and is denoted $\bf{TQ}$: $$ \bf{TQ} = \underset{q\in\bf{Q}}{\sqcup}\bf{T_qQ}$$ If $(q,v)$ is an element of the tangent bundle $\bf{TQ}$, then that means that $v$ is a tangent vector to some curve passing through the point $q$.

5: The Lagrangian is a function which takes three (or two, depending on your point of view) inputs - a point $(q,v) \in\bf{TQ}$, and a real number $t\in \mathbb{R}$ - and maps them to a real number: $$ L : \bf{TQ} \times \mathbb{R} \rightarrow \mathbb{R}$$ $$ (q,v, t) \mapsto L(q,v, t)$$ A crucial point is that $q$ does not determine $v$ - as far as $L$ is concerned, $q$ is just some point in $\bf{Q}$ and $v$ is the tangent vector to one of the infinity of curves that passes through $q$.

6: The action functional $S$ maps a curve $\gamma$ to a real number in the following way: $$S[\gamma] = \int L\big(q_\gamma(t),\dot q_\gamma(t), t\big) dt $$ To reiterate the above point, the Lagrangian has three slots - one for a point in configuration space, one for a tangent vector, and one for a real number. As far as $L$ is concerned, these three slots are independent, so we can take partial derivatives at our leisure.

When we execute the action functional, we walk along the curve $\gamma$. At each $t$, we feed $\gamma(t)\equiv q_\gamma(t)$ into the first slot, $V_\gamma(t) \equiv \dot q_\gamma(t)$ into the second slot, and $t$ into the third slot. But it can't be emphasized enough that the Lagrangian itself has no idea that the three inputs have anything whatsoever to do with one another.

Now that that's out of the way, we can get down to business. We seek some $\gamma$ for which the action functional is stationary. Intuitively, we think "take the derivative and set it to zero," but at this stage it's not really clear how to take a derivative with respect to a curve.

Instead, we'll do the following. Denote the correct (but unknown) curve $\gamma_c$. Then a general curve $\gamma$ can be written as the "sum" of $\gamma_c$ and some "error" $\eta$ which vanishes at the endpoints of the integral, and where the sum is defined component-wise. In other words, at some time $t$,

$$q_\gamma(t) = q_c(t)+\epsilon\eta(t) \equiv \big(q_{c1}(t)+\epsilon\cdot\eta_1(t),q_{c2}(t)+\epsilon\cdot\eta_2(t),\ldots,q_{cN}(t)+\epsilon\cdot\eta_N(t)\big)$$

while the tangent vector (also called the generalized velocity) becomes $$\dot q_\gamma(t) = \dot q_c(t)+\epsilon\cdot \eta'(t)\equiv \big(\dot q_{c 1}(t)+\epsilon\cdot\eta_1'(t),\dot q_{c 2}(t)+\epsilon\cdot\eta_2'(t),\ldots,\dot q_{cN}(t)+\epsilon\cdot\eta_N'(t)\big)$$

where $\epsilon\in \mathbb{R}$. Rather than worry about the details of functional derivatives, we can seek a path $\gamma$ which makes the action integral stationary with respect to changes in $\epsilon$: $$ \frac{dS[\gamma]}{d\epsilon} = 0$$

The action functional becomes $$S[\gamma] = \int_A^B L\big(q_\gamma(t),\dot q_\gamma(t),t\big) dt=\int_A^B L\big(q_c(t)+\epsilon\cdot\eta(t),\dot q_c(t)+\epsilon\cdot\eta'(t),t\big) dt$$

Differentiating with respect to $\epsilon$ gives $$\frac{dS[\gamma]}{d\epsilon} = \int_A^B\sum_{i=1}^N\left[ \frac{\partial L}{\partial q_{\gamma i}}\eta_i(t) + \frac{\partial L}{\partial \dot q_{\gamma i}} \eta_i'(t) \right]dt$$

We now recognize that

$$ \frac{\partial L}{\partial \dot q_{\gamma i}} \eta_i' (t) = \left[\frac{\partial L}{\partial \dot q_{\gamma i}} \eta_i (t)\right]' - \left(\frac{d}{dt}\frac{\partial L}{\partial \dot q_{\gamma i}}\right) \eta_i (t)$$

and since the boundary term vanishes at the endpoints, we find that

$$\frac{dS[\gamma]}{d\epsilon} = \int_A^B\sum_{i=1}^N\left[\frac{\partial L}{\partial q_{\gamma i}} - \frac{d}{dt}\frac{\partial L}{\partial \dot q_{\gamma i}}\right]\eta_i(t) dt $$

Because this quantity must vanish for any independent set of choices of $\eta_i$, it follows that the integrand must vanish everywhere, and so

$$\frac{d}{dt}\frac{\partial L}{\partial \dot q_{\gamma i}} = \frac{\partial L}{\partial q_{\gamma i}}$$

This gives us the Euler-Lagrange equations which allow us to solve for the proper path in terms of the generalized coordinates $q_{\gamma i}$.

The specification of a curve, which links the generalized coordinates to the generalized velocities, happens at the level of the action, not at the level of the Lagrangian. As far as $L$ is concerned, $q(t)$ and $\dot q(t)$ have nothing to do with one another and can be chosen completely independently. That's the difference between feeding $L$ the number $q(t)$ as opposed to the function $q$.

The general Lagrangian formalism is developed in a manifold $j^1(E)$ with the structure of a jet bundle constructed out of a fiber bundle $E \to \mathbb R$.

In other words $E$ is locally the product of $Q$ and $\mathbb R$, where $Q$ is a manifold where configurations of the system are described at every time $t \in \mathbb R$.

$E$ is covered by local coordinate patches $t, q^1,\ldots, q^n$ where $t$ is the temporal coordinate over the basis $\mathbb R$ of the fiber bundle $E \to \mathbb R$ and $q^1,\ldots, q^n$ cover the fibers $Q_t$ (diffeomorphic to $Q$).

The first jet extension $j^1(E)$ over $\mathbb R$ enlarges each fiber $Q_t$ by adding a further factor $\mathbb R^n$ covered by jet coordinates, $\dot{q}^1,\ldots, \dot{q}^n$ independent from the $q^1,\ldots, q^n$ but such that they identify to $\frac{dq^1}{dt}, \cdots, \frac{dq^n}{dt}$ as soon as a motion $t \mapsto (t, q^1(t), \ldots, q^n(t))$ is given. In other words $(t, q^1, \ldots, q^n, \dot{q}^1,\ldots, \dot{q}^n)$ fix the kinetic state of the system at time $t$. Here the configuration and the kinetic state are completely independent. The fibers of $j^1(E)$ are therefore $2n$-dimensional manifolds $A_t$, the space of kinetic states at time $t$, diffeomorphic to a canonical fiber $A$ covered by local coordinates $q^1, \ldots, q^n, \dot{q}^1,\ldots, \dot{q}^n$

In view of this structure, changing local coordinates and passing to $t', q^{'1},\ldots, q^{'n}, \dot{q}^{'1},\ldots, \dot{q}^{'n}$ the relations are $$t' = t+c\tag{1}$$ $$q'^k = q'^k(t, q^1,\ldots, q^n)\tag{2}$$ $$\dot{q}^{'k} = \frac{\partial q^{'k}}{\partial t} + \sum_{j=1}^n \frac{\partial q^{'k}}{\partial q^j} \dot{q}^j\tag{3}$$ and the inverse relations have the same structure.

You see that the third equation is compatible with the interpretation of $\dot{q}$ as time derivative of $q$. This interpretation is only formal because that derivative cannot be computed when a point $a\in A_t$ is given: to compute the said derivative we would need a curve (a section) passing through $a$, not only $a$ itself.

Euler-Lagrange equations are first-order equations induced by a scalar function ${\cal L} : j^1(E) \to \mathbb R$ that, in every local chart determines a section $t \mapsto \gamma(t) \in j^1(E)$, in coordinates $$t \mapsto (t, q(t), \dot{q}(t))\:, $$ solution of, for $k=1,\ldots, n$, $$\frac{d}{dt} \frac{\partial {\cal L}}{\partial \dot{q}^k}- \frac{\partial \cal L}{\partial q^k}=0\:.$$ $$\frac{dq^k}{dt} = \dot{q}^k(t)\:.$$ You see that $\dot{q}$ results to be the time derivative of $q$ only along the solutions of the E-L equations, otherwise $q$ and $\dot{q}$ are independent variables.

ADDED COMMENT. Why jet bundles?

The overall idea is finding a mathematical structure that encodes the idea that

$q$ and $\dot{q}$ are independent variables and they become dependent ($\dot{q}$ is the time derivative of $q$) along every solutions of equations of motion.

The first idea is modeling the space of kinetic stats on the tangent bundle of the configuration space $TQ$ where $Q$ is covered by Lagrangian coordinate patches $q^1,\ldots q^n$. Here $\dot{q}^1, \ldots, \dot{q}^n$ are the components of tangent vectors at $q^1,\ldots q^n$ (interpreted as tangent vectors to curves through that point parametrized by means of the time coordinate).

This is nice but, this way, transformations of coordinates explicitly depending on time are mathematically unnatural but physically necessary (think of Lagrangian coordinates at rest with two different reference frames one inertial and the other not inertial).

A way out is using as spacetime of kinetic states the Cartesian product $A = \mathbb R \times TQ$, where $\mathbb R$ is the temporal axis and viewing admissible coordinates on $A$ as coordinates $(t,q^1,\ldots, q^n, \dot{q}^1, \ldots, \dot{q}^n)$ where $t\in \mathbb R$ and $q^1,\ldots, q^n$ are coordinates on $Q$ and $\dot{q}^1, \ldots, \dot{q}^n$ are coordinates on each fiber of $TQ$. The coordinate $t$, in classical physics is required to coincide with the absolute time and thus it is fixed just up to an additive constant. This explains why we restricted the possible changes of temporal coordinate to the elementary (1).

This picture can be implemented already at the level of space of configurations, defining the spacetime of configurations as $E: =\mathbb R \times Q$.

In practice this construction is effective, but it suffers from the ideological drawback that every coordinate change (1)-(3) may use a different realization of $E$ (and $A$) as a Cartesian product as is evident form the transformation rules (2) (and (3)), whereas no natural choice exists in general.

So we should look for a structure that looks like a Cartesian product (at least locally) but its Cartesian decomposition is not canonical and it admits an adapted atlas of local charts whose transformation rules are stated in (1)-(3).

The first step to remove a fixed Cartesian product structure is, restricting to (1) and (2) only, assuming from scratch that the spacetime of configurations is not $\mathbb R \times Q$ but a manifold which locally looks like that product without fixing any particular choice of this decomposition.

This structure exists and is well known in mathematics: it is a fiber bundle $E \to \mathbb R$ with canonical fiber diffeomorphic to $Q$. The atlas of local coordinates adapted to the bundle structure (with preferred global coordinate defined up an additive constant on the basis $\mathbb R$) is made of local charts $t, q^1,\ldots, q^n$ transforming exactly as in (1)-(2).

It remains to further extend this structure to encompass the kinetic information. The manifold $A= j^1(E)$ is a very good candidate. It is nothing but $E$ with the addition of $n = \dim (Q)$ coordinates $\dot{q}^1, \ldots \dot{q}^n$ to each fiber for every natural coordinate patch $t, q^1,\ldots, q^n$, with the requirement that changing coordinates (3) holds true. This is because, in the definition of jet bundle, the added dot coordinates must be interpreted as the components of tangent vectors of sections in $E$, i.e., the components of all possible tangent vectors to curves $\mathbb R \ni t \mapsto (q^1(t), \ldots, q^n(t))$ passing through each point of $E$.

  1. Part of OP's question seems to be a matter of semantics: If a Lagrangian $$L(q^1,\ldots, q^n, v^1,\ldots, v^n,t)\tag{1}$$ has $n$ independent generalized position variables $q^1,\ldots, q^n$, i.e. the configuration space is $n$-dimensional, then the system is said to have $n$ degrees of freedom (DOF), cf. e.g. this Phys.SE post.

    This definition of DOF is used despite the fact that the Lagrange equations are $n$ 2nd-order coupled ODEs and hence the full solution have $2n$ integration constants, i.e. the number $n$ of DOF is defined as half the number of integration constants!

  2. Another issue is that the generalized velocities $v^1, \ldots, v^n,$ are independent variables in the Lagrangian (1), but they are dependent variables in the action $$S[q^1,\ldots, q^n; t_i,t_f]~:=~ \int_{t_i}^{t_f}\!\mathrm{d}t~ L(q^1,\ldots, q^n, \dot{q}^1,\ldots, \dot{q}^n,t),\tag{2}$$ This is e.g. explained in this Phys.SE post.