Inconsistency with partial derivatives as basis vectors?

Raising and lowering indices in a vector is not a valid operation. Basis vectors are no exception. While $x_\mu=g_{\mu\nu}x^\nu$ is a valid operation, $\hat e^\mu=g^{\mu\nu}\hat e_\nu$ is not. The reason is that in the first case you are dealing with the components of a vector, and in the second case you are dealing with a vector itself.

Let me elaborate. Given a vector $\hat X$ $$ \hat X=x^\mu\hat e_\mu $$ you can lower the index $\mu$ in $x^\mu$ through $$ x_\mu\equiv g_{\mu\nu}x^\nu $$

That is: raising and lowering indices is an operation that is defined for the components of a vector (or covector).

The index $\mu$ in $\hat e_\mu$ is not a vector index; it just labels the different basis vectors. You cannot raise/lower this index, because $\hat e_\mu$ does not denote the components of any vector. The operation $$ \phantom{\color{red}{\text{NO!}}}\qquad\hat e^\mu\equiv g^{\mu\nu}\hat e_\nu\qquad\color{red}{\text{NO!}} $$ is a meaningless operation.

The same thing can be said about covectors. Given a covector $\tilde X$ $$ \tilde X=x_\mu\tilde e^\mu $$ you can raise the index in $x_\mu$. But you cannot lower the index in $\tilde e^\mu$, because that index does not denote the components of a covector; it just labels the different basis covectors.

Most importantly, while $\hat e_\mu$ is a basis of the space of vectors, and $\tilde e^\mu$ is a basis for the space of covectors, these objects are not related through $$ \phantom{\color{red}{\text{NO!}}}\qquad\hat e^\mu= g^{\mu\nu}\tilde e_\nu\qquad\color{red}{\text{NO!}} $$ or any similar relation.

In short: you can raise/lower indices when those indices denote the components of an object - either a vector or a covector - but you cannot raise/lower the indices of the bases of vectors/covectors, because those indices do not denote the components of anything. They are just labels.

However, see Musical isomorphism.

I hope that at this point, you are still with me. Given an arbitrary vector $\hat v$ (like $\hat X$ or $\hat e_\mu$), and a certain function $f$, we can define the action of $\hat v$ on $f$ as follows: we define $$ \hat e_\mu[f]\equiv \frac{\partial f}{\partial x^\mu}\in\mathbb R $$ and we extend this through linearity: if $\hat v=v^\mu \hat e_\mu$, then $$ \hat v[f]=v^\mu\frac{\partial f}{\partial x^\mu}\in\mathbb R $$

I'm not going to discuss why this new operation is useful. But let me stress that this operation is something new, something that you might have never seen before: now vectors can act on functions! In any case, useful or not, this new operation motivates us to consider the following convenient notation: we will write $\hat \partial_\mu$ instead of $\hat e_\mu$: $$ \hat \partial_\mu\equiv \hat e_\mu $$

With this, our equation from before now becomes $$ \hat\partial_\mu[f]=\frac{\partial f}{\partial x^\mu} $$

Note that we are using the same symbol, $\partial$, with two different meanings: on the one hand, it denotes a basis vector, and on the other hand, it denotes a partial derivative. The usual thing we do is to drop the distinction: we just write $\partial_\mu$ for both, and let context decide what the symbol means.

In the same vein, we usually use the symbol $\mathrm dx^\mu\equiv\tilde e^\mu$. That is, we denote the basis of covectors by the symbol $\mathrm dx^\mu$. It's just notation.

Let us now move on to the gradient. We define the covector $\mathrm d f$ as the covector that has $\frac{\partial f}{\partial x^\mu}$ as components: $$ \mathrm d f=\frac{\partial f}{\partial x^\mu}\tilde e^\mu $$ or, using our new notation, $$ \mathrm d f=\partial_\mu f\,\mathrm dx^\mu $$

You can raise and lower the $\mu$ index in $\frac{\partial f}{\partial x^\mu}$, because this index denotes the components of a covector. In this sense, you could say that you can raise/lower the $\mu$ index in $\partial_\mu$, whenever this symbol denotes a derivative. But you cannot raise/lower the $\mu$ index in $\hat \partial_\mu$, whenever this symbol denotes a basis vector (for the same reason you cannot raise/lower the $\mu$ index in $\hat e_\mu$).

In short: the objects $\partial_\mu$ and $\mathrm dx^\mu$ replace the old notation $\hat e_\mu$ and $\tilde e^\mu$, but they denote the exact same object: they are a basis for the space of vectors and covectors. This means that you cannot raise/lower their indices. On the other hand, the object $\partial_\mu f$ denotes the components of the covector $\mathrm df$, and as such, you can raise/lower its index.


To understand what happens when we raise or lower indices, we have to see what actually are the objects we are operating on.

TL;DR - You raise (lower) components of vectors (dual vectors), not their basis.

To see why derivatives are used as a basis, we use the following motivation: Imagine a curve somewhere in $\mathbf{R}^3$. The curve will have a tangent vector over each of its points. If we denote the curve as $r(\lambda)$ (a function) with $\lambda$ the curve parameter, the tangent vector will be $$\mathbf{t} = \frac{d\mathbf{r}(\lambda)}{d\lambda} = \sum\frac{dx^i}{d\lambda}\hat{x}^i$$ So now we know the "rate and direction of change" of the curve, at a point $r(\lambda)$. We have gained a vector, and we would like to use this vector to describe other things happening over that manifold.

The next question we ask, is what is the rate of change of some other object in the direction of that first vector. Well, we are still in $\mathbf{R}^3$, so we know how to find those "rates of change" - the nabla operator $\nabla = \frac{\partial}{\partial x} \hat{x} + \frac{\partial}{\partial y} \hat{y} + \frac{\partial}{\partial z} \hat{z}$. To find the rate of change in the direction of the previously gained vector, we project the $\nabla$ on $\mathbf{t}$

$$\mathbf{t}\nabla = \sum \frac{dx^i}{d\lambda}\frac{\partial}{\partial x^i}$$ Here we see that by using tangent vectors, we can probe and gain information about rates of change of objects in certain directions.

Now, if we use this motivation, that vectors, when acting on some objects, give us information about some rate of change, we can construct a basis for tangent vectors over $\mathbf{R}^3$ which is $\{ \frac{\partial}{\partial x} , \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \}$

We can show that such a basis can be constructed over each point of a general manifold, short-handed written $\{\partial_\mu\equiv\frac{\partial}{\partial x^\mu}\}$.

Vectors were constructed as objects acting on functions, and there are also objects that act on vectors and send them to real numbers, called dual vectors. Again, by drawing motivation from $\mathbf{R}^3$, we can construct a basis for these dual vectors, and we denote this basis as $\{ dx^\mu\}$, where this basis is defined by how it acts on the vector basis: $$dx^\mu(\partial_\nu) = \delta^\mu_\nu$$

Now comes one of the key points where the mistake happened - we have defined a basis for vectors, and a vector is an object such as $u = u^\mu\partial_\mu$. For instance, we can construct a vector $v=1\cdot\partial_1$. So, here number 1 is a vector component, while $\partial_1$ is the component of a basis. As an example, a dual vector would be written as $\omega = \omega_\nu dx^\nu$.

Vectors and dual vectors have a special relationship, a dual vector $\omega$ acts on a vector $v$ and sends it to a real number. We can express this using their bases. $$\omega(v) = \omega_\mu dx^\mu v^\nu\partial_\nu=\omega_\mu v^\nu dx^\mu\partial_\nu = \omega_\mu v^\nu\delta^\mu_\nu = \omega_\mu v^\mu$$

Now we come to the metric tensor. A tensor is such an object which acts on a certain number of vectors and dual vectors, depending on the tensor type. A metric tensor is a tensor which acts on two vectors.

We can write down the metric tensor using the previously defined bases as: $$g = g_{\mu\nu}dx^\mu \otimes dx^\nu$$ So the action of a metric tensor, is, it takes two vectors as an input and outputs a real number. Written fully in a basis this is: $$g(u, v) = g_{\mu\nu}dx^\mu(u^\alpha \partial_\alpha) \otimes dx^\nu (v^\beta \partial_\beta)$$ $$g(u, v) = g_{\mu\nu}u^\alpha v^\beta dx^\mu (\partial_\alpha) dx^\nu(\partial_\beta)$$ $$g(u, v) = g_{\mu\nu}u^\alpha v^\beta \delta^\mu_\alpha \delta^\nu_\beta = g_{\mu\nu}u^\mu v^\nu$$

This operation has a short-hand notation $$g_{\mu\nu}u^\mu v^\nu = u_\nu v^\nu$$ and only here is where the lowering and raising happens. This can also be formally well defined by saying that the metric induces a natural isomorphism between vectors and dual vectors.

So, when you lower indices, you must only act on components of vectors, not on the basis, likewise, when you raise indices you must only act on components of dual vectors, not their basis.

For a good reference, I recommend "An introduction to manifolds" by Loring W. Tu


I think your source of confusion is conflating the use of enumeration indices for basis vectors, and the use of vector indices for the components of a vector. These two types of indices need to be treated differently. First I will say what I mean by the two types of indices, then I will say how they need to be treated differently.

The first type of index is an enumeration index for the basis. So lets suppose we have $n$ dimensional vector space, and lets choose a basis. The basis vectors can be written as $$\hat{e}_\mu,\quad \mu = 1,2,3,\cdots,n.$$ In this case, the index $\mu$ is an enumeration index just used to list the basis vectors.

Now a vector $v$ can be written using coordinates with respect to this basis. In this case we would write $v=v^\mu \hat{e}_\mu$. In this case, the $\mu$ in $v^\mu$ is a vector index. The difference is that for each value of $\mu$, $v^\mu$ is just a number where $\hat{e}_\mu$ had been a vector. Additionally, if we change bases to new basis $\hat{\tilde{e}}_\mu$, related to the orginal basis $\hat{e}_\mu$ by $$\hat{\tilde{e}}_\nu = R_\nu{}^\mu \hat{e}_\mu, $$ then the coordinates $\tilde{v}^\nu$ of $v$ with respect to the new basis $\hat{\tilde{e}}_\nu$ are related to the old coordinates $v^\mu$ with respect to the old basis $\hat{e}_\mu$ by $$\tilde{v}^\nu = R^{-1}{}^\nu{}_\mu v^\mu.$$

I think by now I have explained how these two types of indices must be treated differently. One enumerates a set of vectors, the other enumerates a set of real number coordinates that transform under coordinate transformations.

Now lets suppose we have an inner product with coordinates $g_{\mu\nu}$. Then for any basis $\hat{e}_\mu$, you can obtain a dual basis $\hat{e}^\mu$ for the covector space, satisfying $$\hat{e}^\mu(\hat{e}_\nu) = \delta^\mu{}_\nu.$$

Now given any vector $v$, you can associate with a dual vector $v'$, where this dual vector $v'$ acts on vectors $w$ by taking the inner product with it $\langle v, w \rangle$. To get the coordinates of this dual product $v'$, we can write in in the form $v' = v'_\mu \hat{e}^\mu$. We find that $$ v^\mu g_{\mu \nu} = \langle v, \hat{e}_\nu \rangle = \langle v'_\mu \hat{e}^\mu, \hat{e}_\nu \rangle = v'_\mu\langle \hat{e}^\mu, \hat{e}_\nu \rangle = v'_\mu\delta^\mu{}_\nu = v'_\nu. $$ Therefore we find that if $v^\mu$ are the coordinates of a vector with respect to some basis, then the coordinates $v'_\mu$ of the dual vector $v'$ with respect to the dual basis is given by $v'_\nu = v^\mu g_{\mu \nu}$. In this sense, you can use the metric to raise the indices on coordinates. This is made possible because each coordinate is a real number, and when you take linear combinations of real numbers, you get another real number.

On the other hand, you cannot say $\hat{e}_\nu = \hat{e}^\mu g_{\mu \nu}$, because the right hand side is a vector, and the left hand side is a linear combination of covectors, which gives you another covector, but covectors and vectors are different kinds of objects, so they can't be equal.

I think this should answer your first two questions. I don't really know the answer to the third question other than to say that the easiest way of defining the tangent space is in terms of derivative operators and the partial derivatives with respect to coordinates make a natural basis.