*understanding* covariance vs. contravariance & raising / lowering

What makes the typical physics explanation of differential geometry so confusing, is that it tends to be so coordinate based that it's hard to grasp that most of the objects do not depend on a coordinate system. From mathematics, I'm more used to refering to the vectors, not as contra- and covariant vectors, but as tangent vectors and differentials (or cotangent vectors).

Let's imagine our manifold is the surface of the Earth. We have a nice map of it in an atlas with longitudes along the x-axis and latitudes along the y-axis. Let's use the coordinates $x^{\mu}$ where $\mu\in\{\text{long},\text{lat}\}$, but remember these only serve to tell us where on the map a certain position on Earth is.

Now, let's say we go for a walk. We time our walk (in seconds) and at any time we are at some point $x(t)$ on Earth. We can describe these in coordinates as $x^\mu(t)$, and at any time we may give our speed as $\dot x^\mu=dx^\mu/dt$; just remember that the point on Earth $x(t)$ exists independent of which coordinates $x^\mu$ we use.

If we use degrees as the unit for latitudes and longitudes, the speed will have units $\text{deg}/\text{sec}$. The vector $\dot x$ is a tangent vector indicating our speed and direction independent of which coordinates we are using, and the natural way to draw such a tangent vector is as an arrow.

If I walk along the equator, the tangent vector is likely to be a rather short vector on the map. However, if I walk in the east--west direction close to one of the poles, since the map is streched out (relative to actual distances on the Earth) it might produce a rather long vector. I.e. when the map is stretched, the tangent vectors get stretched along with it. So if we stretch the map, the arrow representing the tangent vector gets stretched with it.

Now, let's say we have a function $F$ that takes a value anywhere on Earth. It could be the altitude at the surface, the temperature, etc.: let's say it measures the temperature in Kelvin. At any point, the function has a gradient. If we wish to illustrate $F$ on the map, one way is to colour the map according the the values of $F$, or draw the contours of $F$ on the map, i.e. the curves for which $F$ is constant. If we stretch or deform the map, these contours will still be correct as they deform with the map, so they do not depend on the coordinatesystem. The differential $dF$ tells us how fast $F$ changes at any point and in any direction and has units $\text{K}/\text{deg}$. If we stretch the map, the contour lines get further apart, making the gradient appear less steep on the map. In coordinates, we write this $dF=(dF/dx^\mu)dx^\mu$ where $dx^\mu$ is just the gradient of the coordinate. The point, again, is that $dF$ is actually independent of the coordinate system.

If we combine our walk with the function $F$, we get $F(x(t))$ as the value along our path. The change in time becomes $(d/dt)F(x(t))$ which we can write out as $$\frac{d}{dt}F(x(t))=\frac{dF}{dx^\mu}\dot x^\mu=dF\cdot\dot x\tag{1}$$ and is again independent of the coordinate system. The $dF$ and $\dot x$ are the differential of $F$ and the tangent vector of $x$, both of which are independent of the coordinates we choose to use. The units are also informative: $\dot x$ has units $\text{deg}/\text{sec}$, while $dF$ has units $\text{K}/\text{deg}$.

From (1), we see that there is a natural way to take the product of a tangent vector with a differential. Indeed, the differentials (at any point) form the dual vector space of the vector space of tangent vectors, which is why they are also called cotangent vectors.

All of this is done entirely without the need for a metric.

The metric only comes into play when you e.g. want to convert tangent vectors into a measure of actual physical distances. If you want to compute the length of a path, you need a metric. Similarly, it's needed when computing speeds in absolute terms as in the kinetic energy $E_{\text{kin}}=\frac{m}{2}g_{\mu\nu}\dot x^\mu\dot x^\nu$. Yet another place is in field/wave equation where e.g. $g^{\mu\nu}(d\phi/dx^\mu)(d\phi/dx^\nu)$ may enter.

Connections, which are mathematical object that tell you how to parallell transport vectors along a path from one point to another, can be defined without a metric. However, if there is a metric, there is a particular connection, the Levi-Civita connection, which naturally corresponds to the metric (which is natural since you need the metric to specify what is ment by shortest distance path), and when specifying the Levi-Civita connection (which in a coordinate system is done with the Christoffel symbols) you will encounter raising/lowering of indices.

While the metric does induce a natural way to identify the tangent and cotangent vector spaces, which is the identification that is applied when raising or lowering indices, this identification is metric dependent and should therefore only be required when you are computing something that depends on the metric.

My recommentation would be not to try to attribute meaning, at least not too much, to this identification of the tangent and cotangent vector spaces. Instead, you could think of why these enter the picture in physics at all and understand those cases.


The short version is that indices tell you how things behave under arbitrary change of coordinates. (The long version genuinely requires some level of comfort with abstract algebra to appreciate.) When you use the metric to transform contravariant things to covariant things, the operation you're performing does not commute with arbitrary change of coordinates; it only commutes with those change of coordinates which also preserve the metric. So as long as you only work with maps that preserve the metric, there's no harm in doing this, but the moment you allow maps that don't preserve the metric, you have to be careful what kind of identifications you're making.

Covariant and contravariant tensors live in different spaces, but things don't have to live in the same space to interact. We have the freedom to talk about operations $f : X \times Y \to Z$ that take inputs of two different types and returns an input of potentially yet another type, and tensor contraction is an operation of this form.

I'm not sure what you mean by the "meaning" behind changing a vector from covariant to contravariant.


Different physical quantities have different transformation rules: just to give an example:

  • position $q^i$ transforms contravariantly
  • momentum $p_i$ transforms covariantly

Why? In classical mechanics the Lagrangian is defined as a function of $q^i$ and its derivatives. On the other hand, the generalized momentum is given as:

$$ p_i=\frac{\partial L}{\partial \dot{q^i}} $$

If we change from $q$ to $\bar{q}$ the transformation law will be inverse for coordinates verse momentum due to the chain rule. This is just an example. What we choose to frame physics in terms of is in some sense a choice. Because we can convert covariant to contravariant objects with the metric there are many ways to frame a given set of physical laws.

Turning to your question about changing frames of reference changing covariant to contravariant, this is not the case. The metric transformation is not a coordinate change, it is something quite different. It's a way of changing notation, or more mathematically speaking, it is the implementation of an isomorphism.

More important than the choice of notation (writing tensors contravariant or covariantly) is the construction of the action or lagrangian. It must satisfy certain symmetries depending on what kind of physics you consider:

$$ L = \frac{m}{2} \vec{v} \cdot \vec{v} = \frac{m}{2} v_iv^i $$

The dot-product is invariant under rotations, this Lagrangian is invariant under rotations as it ought since it models a free particle in euclidean space.

$$ L = kF_{\mu \nu}F^{\mu \nu} $$

where $F_{\mu \nu}$ is the Faraday tensor which transforms covariantly whereas $F^{\mu \nu}$ is the contravariant version. Together they form a scalar with respect to Lorentz transformations (I'm avoiding the full discussion about Poincare transformations here).

The question of physics is partly this: how can you construct scalars given the symmetry of your theory? Ultimately this leads to the study of representation theory, spinors etc... it's not a short story and the question you are asking is certainly worth asking.

In the coordinate free language the covariance of the components is balanced by the contravariance of the basis or vice-versa. Note as an example: $$ \bar{A}_{\mu'} = \Lambda^{\nu}_{\mu'}A_{\nu} \qquad \text{whereas} \qquad d\bar{x}^{\mu'} = \frac{\partial \bar{x}^{\mu'}}{\partial x^{\nu}}dx^{\nu} $$ where $\bar{x}^{\nu'} = \Lambda^{\nu'}_{\mu} x^{\mu}$. Differentiate to see $\Lambda^{\mu}_{\nu'} = \frac{\partial x^{\mu}}{\partial \bar{x}^{\nu'}}$ for Minkowski space where I'm considering a coordinate change is constant over all points in spacetime; a Lorentz transformation. Put it together, since $\frac{\partial \bar{x}^{\nu'}}{\partial x^{\mu}}$ is inverse to $\frac{\partial x^{\mu}}{\partial \bar{x}^{\nu'}}$ by the chain rule: this means that:

$$ \frac{\partial \bar{x}^{\nu'}}{\partial x^{\mu}}\frac{\partial x^{\mu}}{\partial \bar{x}^{\alpha'}} = \delta_{\alpha'}^{\nu'} $$

which we could write in the $\Lambda$ notation as $\Lambda^{\nu'}_{\mu}\Lambda^{\mu}_{\alpha'}=\delta_{\alpha'}^{\nu'} $. The form $A$ can either be written in the barred or unbarred coordinates.

$$ A = \bar{A}_{\mu'}d\bar{x}^{\mu'} = A_{\nu}dx^{\nu} $$

The claim that these are in fact equal is supported by the transformation laws I gave above for the covariant components of $A$ and the contravariant transformation of the basis forms $dx^{\mu}$.

The mathematics I'm outlining here is mostly linear algebra and the concept of a basis. The transformation law for the basis is inverse the components. The fundamental object considered, be it a vector, form, tensor etc... is invariant under the coordinate change. It is our picture of it that changes. That is how I think about it.