Momentum is a cotangent vector?

I) Disclaimer: In this answer we will use the (traditional) physicist's definition of tensors using indices and their transformation properties under coordinate transformations. Moreover, let us suppress time dependence $t$ for simplicity.

II) Let the manifold $Q$ be the configuration space. The Lagrangian $L:TQ\to \mathbb{R}$ transforms as a scalar

$$L~~\longrightarrow~~ L^{\prime}~=~L, \tag{1}$$

the velocity $v^i$ transforms as a vector

$$v^i~~\longrightarrow~~ v^{\prime j}~=~\frac{\partial q^{\prime j}}{\partial q^i}v^i,\tag{2}$$

the Lagrangian canonical momentum

$$p_i~:=~ \frac{\partial L}{\partial v^i}\tag{3}$$

transforms as a covector

$$p_i~~\longrightarrow~~ p^{\prime}_j~=~\frac{\partial q^i}{\partial q^{\prime j}}p_i,\tag{4}$$

under general coordinate transformations

$$ q^i~\longrightarrow ~q^{\prime j}~=~ f^j(q)\tag{5}$$

in the configuration space $Q$. Eq. (4) follows from the chain rule.

III) A point in the tangent bundle is of the form

$$(q,v)~\in~TQ,\qquad v~=~v^i \frac{\partial}{\partial q^i}.\tag{6} $$

Note that the velocity $v$ is an independent variable, which transforms as a vector (2) under general coordinate transformations (5) in the configuration space $Q$.

IV) The Lagrangian canonical momentum (3) can be viewed as a section

$$TQ ~\ni~ (q,v) ~\stackrel{p}{\mapsto} ~(q,v; p_i\mathrm{d}q^i)~\in~T^{\ast}TQ \tag{7}$$

in the bundle $T^{\ast}TQ \to TQ$.

V) Finally, let us for completeness & comparison mention the Hamiltonian canonical momentum (also denoted $p$) in the case where the phase space $M$ is the cotangent bundle $M=T^{\ast}Q$. In the case $M=T^{\ast}Q$, the Hamiltonian canonical momentum $p$ is an independent variable, which transforms as a covector (4) under general coordinate transformations (5) in the configuration space $Q$. A point in the tangent bundle is of the form

$$(q,p)~\in~T^{\ast}Q,\qquad p~=~p_i\mathrm{d}q^i.\tag{8} $$


The momentum is a covector because it is a gradient, and gradients are always covariant. It does what it says on the tin. However, you are right that this is a subtle point and it's not particularly clear at first sight.

For a lagrangian of the form $L=T-V$ with $V$ independent of $\dot q$, the canonical momentum is given by $$ p=\frac{\partial L}{\partial \dot q}=\frac{\partial T}{\partial \dot q}. $$ This derivative measures how much $T$ changes with respect to small changes in $\dot q$, when these changes are small enough that a linear approximation to $T$ suffices. This is exactly the linearity of $p$ as a functional of $\dot q$.

This means that $p$ is a functional over increments in $\dot q$ rather than a functional over $\dot q$ itself. This is of course correct: if you have a configuration space $Q$, then lagrangian mechanics takes place in $M=TQ$, which is the space of all configurations $q$ and the corresponding velocities $\dot q$. Hamiltonian mechanics, on the other hand, takes place in $T^*M$ - the space of linear forms over $TM$. Note here that $TM=TTQ$ is, precisely, the space of increments in velocity (along with the velocities themselves as increments in the position.)