What is the motivation for the equation of the Sturm-Liouville problem?

For me, it's fairly simple, really. Suppose you have the problem $$ L u = a(x) u''(x) + b(x) u'(x) + c(x) u = f(x), $$ with boundary conditions $$ B u = \begin{pmatrix} \alpha_{11} u(x_1) + \alpha_{12} u'(x_1) \\ \alpha_{21} u(x_2) + \alpha_{22} u'(x_2) \end{pmatrix} = 0, $$ where $a$, $b$, $c$ and $f$ are smooth enough, in order for all the following calculations to make sense, and $\alpha_{11}\alpha_{22} - \alpha_{12}\alpha_{21} \neq 0$. In a very broad sense, we are thinking of an operator $L$ that takes a function $u$ in the space $X = \{ u \text{ smooth enough such that } B u = 0 \}$ and send it to the function $f(x)$ in a space $Y$ (with the needed characteristics, I'm not going to get technical here): $$ L: X \longrightarrow Y. $$ In this setting, a fair question is "What are the properties of $L$?"

If we define the inner product as $$ \langle u(x), v(x) \rangle := \int_{x_1}^{x_2} u(x)v(x) dx $$ and the adjoint $L^*$ by the right hand side of the equation $$ \langle u(x), L^* v(x) \rangle := \langle L u(x), v(x) \rangle, $$ then, with the proper definitions, we can study $L$ in the same way we studied matrices. So, let's take a look to the expression $\langle L u(x), v(x) \rangle$. Expanding, we have \begin{multline} \langle L u(x), v(x) \rangle = \big(a u'v - u (a v)' + u(bv)\big)\big|_{x_1}^{x_2} \\ + \int_{x_1}^{x_2}\big(a v'' - (2 a' - b)v' + (a'' - b' + c)v\big)u dx \end{multline} Then, $$ L^* v = a v'' + (2 a' - b)v' + (a'' - b' + c)v $$ and $B = B^*$ (can you prove this?).

As in matrices, a very special kind of operator is the self-adjoint operator, $L = L^*$. Why? Well, among other things, because all its eignevalues are real and all its eigenfunctions are orthogonal (you should prove this also). Now, what does the self-adjoint operator looks like for second order ODEs? The function $a$ and $b$ must satisfy \begin{align} 2 a' - b &= b,\\ a'' - b' + c &= c. \end{align} Thus, $a' = b$, and $L$ can be written as $$ L u = a u'' + a' u' + c u = (a u')' + c u, $$ which is a Sturm-Liouville operator!

So, Sturm-Liouville operators are self-adjoint operators and have all the fantastic properties of self-adjoint operators. Great! But, they seem rather rare, as the condition $a' = b$ seems pretty restrictive. Why on Earth, then, we take so much effort in studying them? Well, what if instead of the regular inner product that I took, I use the inner product $$ \langle u(x), v(x) \rangle_{w(x)} = \int_{x_1}^{x_2} u(x) v(x) w(x) dx, $$ where $w(x) > 0$?

Then, $$ w L u = w a u'' + w b u ' + w c u = (w a u')' + (w b - (w a)')u' + w c, $$ and if $w$ satisfy the ODE $$ a w' + (b - a') w = 0, $$ the operator $wL$ is selfadjoint!

In conclusion,

With the proper inner product ($w(x) = e^{\int \frac{b-a'}{a} dx}$), every second order ODE can be studied as a Sturm-Liouville problem, where the operator is self-adjoint!

With one stroke, we've enclosed all second order ODEs into one, which happens to be self-adjoint. Those guys really knew what they were doing!


I would say that the main motivation for this problem comes from the associated variational framework. For simplicity let's consider the problem to be augmented with homogeneous boundary conditions, namely $y(0) = y(L) =0$. We then define the space $$ H^1_0((0,L)) = \{f : (0,L) \to \mathbb{R} : f, f' \in L^2((0,L)), f(0)=f(L) =0\}, $$ which is well-defined via the theory of traces of Sobolev functions.

We can then define the functionals $E,J : H^1_0((0,L)) \to \mathbb{R}$ via $$ E(y) = \int_0^L p |y'|^2 + q |y|^2 \\ J(y) = \int_0^L w|y|^2. $$ Now the key feature is that part of the S-L equations is the Euler-Lagrange equation for $E$, i.e. we have that critical points of $E$ must satisfy $$ 0 = DE(y) = -(p y')' + qy, $$ which just comes from integrating by parts in the expression $$ 0 = \frac{d}{dt} E(y + t \psi) = 2\int_0^L p y' \psi' + q y \psi $$ and then setting $t=0$.

Where does $J$ come in? Well, the variational derivative of it is $$ DJ(y) = wy, $$ which can be seen by a similar argument to the one above. Now, if instead of attempting to minimize $E$ we look for constrained minimizers, i.e. we look for $$ \min\{ E(y) : J(y) =1 \} $$ then the theory of Lagrange multipliers requires that $$ DE(y) = \lambda DJ(y) \text{ for some }\lambda \in \mathbb{R}, $$ which is equivalent to $$ -(py')' + qy = \lambda w y. $$ This is the form you describe above, up to flipping the sign on $q$, which doesn't really matter since in SL theory we don't assume it has a sign.

The above is really for "global minimizers" of $E$ subject to the constraint that $J=1$, but it can be pushed farther to get all possible "eigenvalues" $\lambda$. For instance, once the global minimizer $y_1$ is known with eigenvalue $\lambda_1$, we can consider $$ \min\{ E(y) : J(y)=1 \text{ and } \int_0^L w y y_1 =0 \}. $$ This will also produce a solution to $$ -(py')' + qy = \lambda w y. $$ but for $y_2$, $\lambda_2$ with $\lambda_2 \ge \lambda_1$.

Proceeding in this way one can ultimately view all solutions to the S-L equations in this form. All of this was done under the assumption that $y(0)=y(L) = 0$, but other boundary conditions work in this framework too.