Geodesics equations via variational principle

Your action is: $$ S[x] = -m\int_{\lambda_0}^{\lambda_1}\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}} d\lambda $$ and you have to impose $\delta S=0$ with the constraints $\delta x(\lambda_0) =\delta x(\lambda_1) =0$, that mean that the considered curves in the domain of $S$ have fixed endpoints.

To compute $\delta S$ you have to replace $x$ for $x+ \epsilon \delta x$ (so $\frac{dx}{d\lambda}$ must be replaced for $\frac{dx}{d\lambda} + \epsilon\frac{d \delta x}{d\lambda}$ ) and finally to compute the derivative respect to $\epsilon$ for $\epsilon=0$.

$$\delta S[x] = \frac{d}{d\epsilon}|_{\epsilon=0} S[x+ \epsilon \delta x]\:.$$

The computation leads to (assuming that $g$ and the curves are $C^1$, these curves defined on the compact $[\lambda_0,\lambda_2]$ one can safely swap the symbol of integral with that of $\epsilon$ derivative, essentially by a known theorem by Lebesgue) $$ \delta S[x] = -\frac{m}{2}\int_{\lambda_0}^{\lambda_1}\frac{- \frac{\partial g_{\alpha \beta}}{\partial x^\delta} \delta x^\delta \tfrac{dx^{\alpha}}{d\lambda}\tfrac{dx^{\beta}}{d\lambda} - 2g_{\alpha\beta} \frac{d \delta x^\alpha}{d\lambda}\frac{d x^\beta}{d\lambda}}{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} d\lambda\:. $$ Notice that $x$ appears in $g_{\mu\nu}=g_{\mu\nu}(x)$, too, and it gives rise to the contribution $\frac{\partial g_{\mu\nu}(x)}{\partial x^\sigma}\delta x^\sigma$ you mentioned in your question.

The denominator in the integral does not vanish as we are varying our curve in the class of timelike curves joining the two fixed endpoints.

Integrating by parts, one gets: $$ \frac{2}{m}\delta S[x] = \int_{\lambda_0}^{\lambda_1}\delta x^\delta\frac{ \frac{\partial g_{\alpha \beta}}{\partial x^\delta} \tfrac{dx^{\alpha}}{d\lambda}\tfrac{dx^{\beta}}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} d\lambda -\int_{\lambda_0}^{\lambda_1} \delta x^\alpha\frac{d}{d \lambda}\frac{2g_{\alpha\beta} \frac{d x^\beta}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} d\lambda + [...]\delta x^\alpha(\lambda_1)-[...]\delta x^\alpha(\lambda_0)\:. $$ The last two terms can be dropped as they vanish by hypothesis. Changing the name of some summed indices we end up with:

$$ \frac{2}{m}\delta S[x] = \int_{\lambda_0}^{\lambda_1}\delta x^\delta\left[\frac{ \frac{\partial g_{\alpha \beta}}{\partial x^\delta} \tfrac{dx^{\alpha}}{d\lambda}\tfrac{dx^{\beta}}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} -\frac{d}{d \lambda}\frac{2g_{\delta\beta} \frac{d x^\beta}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} \right]d\lambda\:. $$ Since the LHS vanishes for every choice of the variation $\delta x^\delta(\lambda)$, we conclude that $\delta S[x]=0$ on a curve $x=x(\lambda)$ is equivalent to the requirement that the said curve verifies: $$\frac{ \frac{\partial g_{\alpha \beta}}{\partial x^\delta} \tfrac{dx^{\alpha}}{d\lambda}\tfrac{dx^{\beta}}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} -\frac{d}{d \lambda}\frac{2g_{\delta\beta} \frac{d x^\beta}{d\lambda} }{\sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}}} =0\:.\quad (1)$$ We can change parameter and use the proper time $d\tau$ so that: $$d\lambda \sqrt{-g_{\mu\nu}(x(\lambda))\, \tfrac{dx^{\mu}}{d\lambda}\tfrac{dx^{\mu}}{d\lambda}} = d\tau$$ and (1) becomes: $$\frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^\delta} \frac{dx^{\alpha}}{d\tau}\frac{dx^{\beta}}{d\tau} -\frac{d}{d \tau}g_{\delta\beta} \frac{d x^\beta}{d\tau} =0\:.\quad (2)\:.$$ Expanding the last derivative changing the name of $\beta$ to $\mu$ in the last term: $$\frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^\delta} \frac{dx^{\alpha}}{d\tau}\frac{dx^{\beta}}{d\tau} - \frac{\partial g_{\delta\beta}} {\partial x^\sigma} \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} -g_{\delta\mu} \frac{d^2 x^\mu}{d\tau^2} =0\:.\quad \:.$$ In other words: $$\frac{d^2 x^\mu}{d\tau^2} - g^{\delta\mu} \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^\delta} \frac{dx^{\alpha}}{d\tau}\frac{dx^{\beta}}{d\tau} + g^{\delta\mu} \frac{\partial g_{\delta\beta}} {\partial x^\sigma} \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} =0\:.$$ Renaming some indices: $$\frac{d^2 x^\mu}{d\tau^2} + \frac{1}{2}g^{\mu\delta}\left(2\frac{\partial g_{\delta \beta}}{\partial x^\sigma} - \frac{\partial g_{\sigma\beta}} {\partial x^\delta}\right) \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} =0\:.$$ Eventually, exploiting $g_{\delta \beta}= g_{\beta\delta}$: $$\frac{d^2 x^\mu}{d\tau^2} + \frac{1}{2}g^{\mu\delta}\left(\frac{\partial g_{\delta \beta}}{\partial x^\sigma} + \frac{\partial g_{\beta \delta}}{\partial x^\sigma}- \frac{\partial g_{\sigma\beta}} {\partial x^\delta}\right) \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} =0\:.$$ Now notice that: $$\frac{\partial g_{\delta \beta}}{\partial x^\sigma} \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} = \frac{\partial g_{\delta \sigma}}{\partial x^\beta} \frac{d x^\beta}{d\tau}\frac{d x^\sigma}{d\tau}=\frac{\partial g_{\delta \sigma}}{\partial x^\beta} \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau}$$ so the found identity can be re-written as: $$\frac{d^2 x^\mu}{d\tau^2} + \frac{1}{2}g^{\mu\delta}\left(\frac{\partial g_{\delta \sigma}}{\partial x^\beta} + \frac{\partial g_{\beta \delta}}{\partial x^\sigma}- \frac{\partial g_{\sigma\beta}} {\partial x^\delta}\right) \frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} =0\:.$$ We have found: $$\frac{d^2 x^\mu}{d\tau^2} + \Gamma^\mu_{\sigma_\beta}\frac{d x^\sigma}{d\tau}\frac{d x^\beta}{d\tau} =0\:,$$ as wished.


There are two definitions of geodesics here.

  1. locally distance minimizing curves You minimize the action, as you did, \begin{equation} S(\gamma) = \int_a^b \sqrt{ g_{\mu\nu} \dot{x}^{\mu} \dot{x}^{\nu} } dt = \int_a^b L dt \end{equation} The Euler Lagrangian equation associated with this action is \begin{equation} \frac{d}{dt}(\frac{\partial L}{\partial \dot{x}^{\mu}}) - \frac{\partial L}{\partial x^\mu}= 0 \end{equation}

  2. curves on which tangent vector is parallel transported. You minimize the action, \begin{equation} E(\gamma) = \int_a^b \frac{1}{2}g_{\mu\nu} \dot{x}^{\mu} \dot{x}^{\nu} dt = \int_a^b \frac{1}{2}L^2 dt \end{equation} through Euler Lagrangian equation \begin{equation} \frac{d}{dt}(\frac{\partial L}{\partial \dot{x}^{\mu}}) - \frac{\partial L}{\partial x^\mu} = -\frac{1}{L}\frac{\partial L}{\partial \dot{x}^{\mu}} \frac{d}{dt}L \end{equation} and get the solution(after some algebra) \begin{equation} \ddot{x}^{\lambda} + \Gamma^{\lambda}_{\mu\nu} \dot{x}^\mu \dot{x}^\nu = 0 \end{equation} This curve parallel transports the tangent vector.

We notice \begin{eqnarray} \frac{d}{dt}(\frac{\partial L}{\partial \dot{x}^{\mu}}) - \frac{\partial L}{\partial x^\mu} = 0\\ \frac{d}{dt}L = 0 \end{eqnarray} solves Euler-Lagrangian equations in both case. $\frac{d}{dt}L$ just fixes the parameterization. For a Riemannian manifold, the parameter differs the length of the curve by an affine transformation \begin{equation} S(\gamma_{sol}(t)) = \int_a^t L(\gamma_{sol}(\tau) d\tau = (t-a) L(\gamma_{sol}(t=0) ) \end{equation} that's why some textbook directly the length(or proper time) as the parameter in the first place.

The second definition is sometimes called affine geodesics. While in GR, in most cases we are talking about affine geodesics, which is a locally minimizing curve plus an affine parameterization.