Normalization of the action in Special Relativity

I prefer to think like this. Inertial observer ($X$) is at rest in his/hers reference frame. The world-line of $X$ is the longest possible route between any two events. This follows since, in its frame, $X$ is moving fully along the temporal axis, therefore $ds=cd\tau$ (displacement in time; $c$ is the speed of light), and $d\mathbf{r}=\mathbf{0}$, and one should bear in mind that, in general, the 4d distance between two events, within the causality cone, is $ds=\sqrt{c^2 d\tau^2 - dr^2}$.

We can therefore describe the path taken by the observer $\bar{x}^\mu=\bar{x}^\mu\left(\tau\right)$ as the (unique) path that is longest possible path between events $A$ and $B$ (that lie on the world-line). Since length of a curve (in 4d sense) is an invariant quantity, the universal description of the path of the observer is, a world-line that maximizes/minimizes:

$S\left[\bar{x}\right]\propto\int^B_A cd\tau$

Where $cd\tau$ gives a length of a small segment of the world-line. We can then fix the sign of the scaling constant in order to demand, that this quantity, which we shall call `action', needs to be minimized to arrive at the correct world-line:

$S\left[\bar{x}\right]=-\alpha\int^B_A cd\tau,\quad \alpha>0$

The above prescription must be valid for all observers, including those that observe $X$ as moving at small velocity $\mathbf{v}\: \left(\left|\mathbf{v}\right|\ll c\right)$. Call any such observer as being in the lab-rame. In lab-frame we can describe $X$ with classical action, thus, in the limit $\left|\mathbf{v}\right|/c\to 0$:

$S=-\alpha\int^B_A cd\tau \to \int^B_A \frac{m\left|\mathbf{v}\right|^2}{2} dt$

Next, the world-line of $X$ in the lab-frame is, by construction $\bar{x}^\mu=\left(ct,\,\bar{\mathbf{r}}\left(t\right)\right)^\mu,\:d\bar{\mathbf{r}}/dt=\mathbf{v}$. The 4d distance between any two (close) events on $\bar{x}$ is:


Thus, in the limit $\left|\mathbf{v}\right|/c\to 0$

$-\alpha\int^B_A \sqrt{c^2-\left|\mathbf{v}\right|^2}dt \to \int^B_A \frac{m\left|\mathbf{v}\right|^2}{2} dt$

This fixes $\alpha$, though, of course, the specific value of $\alpha$ is more of a convention.

For a single particle, it does not matter what prefactor you use, the equations of motion and everything else stays the same. The factors only start to matter when you couple different systems to each other. For example, consider a charged particle in an electromagnetic field described by a vector potential $A_\mu$. The right action describing its movement is

$$ S = S_{\text{EM}}+\int d\tau \left[ -m - q A_\mu \frac{dX^\mu}{d\tau} \right] , $$ where $S_\text{EM}$ is the action for the electromagnetic field.

This only works out to give the correct dynamics if the proportionality factor $m$ here corresponds to the mass of the particle.

Similarly, when one tries to extract meaningful quantities like the energy-momentum tensor from the action, only one choice of the factor will give the correct result.

I personally like to think of it as the particle having an always present static contribution to its potential energy, coming from its rest-mass energy (hence the negative sign), but don't take that too seriously.

The equation you mention is the action of a single point particle. $$S =-mc^2\int d\tau$$

The unit of action is energy multiplied by time, in the present case the rest energy which corresponds to the mass of the particle multiplied by the proper time of the particle.

This equation refers to the action from the point of view of the reference frame of the point particle along its own worldline (which is parameterized by its own proper time), that means that the point particle is its own observer. From its own point of view, its velocity and its displacement in space is always zero, and also its momentum and its kinetic energy is zero. You can say that the equation describes the aging of the particle, its mass (= its rest energy) is transported through time.