Why exactly do we say $L = L(q, \dot{q})$ and $H = H(q, p)$?

We should abandon the "naive" langauge of functions depending on coordinates and consider functions as maps between mathematical spaces, which are only expressed in local coordinates after their domains have been defined.

The starting point for both the Lagrangian and the Hamiltonian formalism is a configuration space $Q$, whose coordinates are called $q^i$. It should be thought of as the space of positions of the system under considerations. The two formalisms now immediately take different paths: Lagrangian mechanics takes place on the tangent bundle $TQ$, Hamiltonian mechanics on the cotangent bundle $T^\ast Q$. The local coordinates on $TQ$ are denoted $(q^i,\dot{q}^i)$, the local coordinates on $T^\ast Q$ are $(q^i,p_i)$. Note that, since there is no metric on $Q$, you do not have a canonical identification of tangents and cotangents and therefore cannot switch between the description freely as one might be used to from Riemannian geometry. Note furthermore that $\dot{q}$ is not the derivative of anything - it's simply a notation for a new coordinate.

The Lagrangian is a function $L : TQ\to \mathbb{R}$. Given it, we may define a function $f : TQ\to T^\ast Q$ in local coordinates by $$ f(q,\dot{q}) = \left(q,\frac{\partial L}{\partial \dot{q}}(q,\dot{q})\right)$$ and the associated Hamiltonian $H : T^\ast Q \to \mathbb{R}$ in local coordinates as the Legendre transform $$ H(q,p) = \sup_{\dot{q}}\left(p_i \dot{q}^i - L(q,\dot{q})\right).$$ It should be clear here that neither $H(q,\dot{q})$ nor $L(q,p)$ are meaningful objects in this context - $H$ and $L$ act on different spaces, you cannot feed a $p$ into $L$ at all. Observe now that $f$ does permit us to do this in some sense, only rigorously: If $f$ is invertible, one may define a "co-Lagrangian" or "Hamiltonian Lagrangian" $L_H : T^\ast Q \to\mathbb{R}$ by $L_H(q,p) = L(f^{-1}(q,p))$. Crucially, $L$ and $L_H$ are different functions and should, for clarity's sake, never be denoted by the same symbol.

The expression in the definition of the Legendre transform obtains its extremum at $$ p_i = \frac{\partial L}{\partial \dot{q}^i}(q,\dot{q}),$$ which means that $$ H(q,p) = p_i\dot{q}^i - L(q,\dot{q})\tag{0}$$ holds exactly for a triple $(q,\dot{q},p)$ such that $$f(q,\dot{q}) = (q,p).\tag{1}$$ Note that the fact that $H$ does not depend on $\dot{q}$ means that $\dot{q}$ in eq. (0) is implicitly a function $\dot{q}(q,p)$ as defined implicitly by eq. (1).

Only when we impose the relation eq. (1) there is a functional relation between the $q,\dot{q},p$, otherwise there is not. This is why, as abstract functions, the Lagrangian is not a function of $p$ and the Hamiltonian is not a function of $\dot{q}$ - these are coordinates on different spaces with no relation to each other. It is only when we impose eq. (1) in order to express the Hamiltonian without the extremisation procedure prescribed in the Legendre transform that they become related, and not necessarily uniquely so. If $f$ is not invertible, then the Lagrangian system is a gauge theory and the Hamiltonian system is constrained - both terms which essentially mean that the relation between the $p$ and the $\dot{q}$ is not uniquely defined.

Finally, let me address a closely related confusion which nevertheless crops up because of the same reason, i.e. not respecting the actual domains functions are defined on. The $q,\dot{q}$ arguments of the Lagrangian are independent, and become dependent only when we consider a path $\gamma: I\to Q$, which induces a path $\tilde{\gamma} : I\to TQ, t\mapsto (\gamma(t),\dot{\gamma}(t))$ on the tangent bundle, where $\dot{\gamma}$ now denotes the actual time derivative, i.e. the tangent vector field to $\gamma$. The action is a function $S : [I,Q]\to\mathbb{R}$, where $[I,Q]$ denotes the space of all maps $I\to Q$, and is defined as $$ S[\gamma] = \int_I L(\tilde{\gamma}).$$ When now considering this action, the physicist often writes the coordinates of $\tilde{\gamma}$ as $(q(t),\dot{q}(t))$, and it is only in this context that $\dot{q}(t)$ truly is a time-dependent function and the derivative of $q(t)$.


There's nothing stopping you from writing $L$ as a function of $q$ and $p$. In fact, you're required to write $L$ as a function of $q$ and $p$ to get the Hamiltonian! But the Euler-Lagrange equations become very ugly.

Consider the normal Euler-Lagrange equation

$$ \frac{d}{dt}\frac{\partial L}{\partial \dot q}=\frac{\partial L}{\partial q} $$

Let's try writing this in terms of $q,p$. The left hand side just becomes $\dot p$. But the left hand side is a lot uglier. We'd have

$$ \frac{\partial }{\partial q}L(q, p(q,\dot{q}))=\frac{\partial L}{\partial q}+\frac{\partial L}{\partial p}\frac{\partial p}{\partial q} $$ and the Euler-Lagrange equation becomes

$$ \dot{p}=\frac{\partial L}{\partial q}+\frac{\partial L}{\partial p}\frac{\partial p}{\partial q} $$

This might not look ugly at first glance, but it is actually terrible. In order to write down the proper Euler-Lagrange equation, we need to know the functional form of $p$ in terms of $q$. Thus, the Lagrangian as a function of $(p,q)$ is not sufficient to generate equations of motion. This is avoided when we go to the Hamiltonian formalism, where Hamilton's equations treat $p$ and $q$ as independent.