Why are there only derivatives to the first order in the Lagrangian?

I reproduce a blog post I wrote some time ago:

We tend to not use higher derivative theories. It turns out that there is a very good reason for this, but that reason is rarely discussed in textbooks. We will take, for concreteness, $L(q,\dot q, \ddot q)$, a Lagrangian which depends on the 2nd derivative in an essential manner. Inessential dependences are terms such as $q\ddot q$ which may be partially integrated to give ${\dot q}^2$. Mathematically, this is expressed through the necessity of being able to invert the expression $$P_2 = \frac{\partial L\left(q,\dot q, \ddot q\right)}{\partial \ddot q},$$ and get a closed form for $\ddot q (q, \dot q, P_2)$. Note that usually we also require a similar statement for $\dot q (q, p)$, and failure in this respect is a sign of having a constrained system, possibly with gauge degrees of freedom.

In any case, the non-degeneracy leads to the Euler-Lagrange equations in the usual manner: $$\frac{\partial L}{\partial q} - \frac{d}{dt}\frac{\partial L}{\partial \dot q} + \frac{d^2}{dt^2}\frac{\partial L}{\partial \ddot q} = 0.$$ This is then fourth order in $t$, and so require four initial conditions, such as $q$, $\dot q$, $\ddot q$, $q^{(3)}$. This is twice as many as usual, and so we can get a new pair of conjugate variables when we move into a Hamiltonian formalism. We follow the steps of Ostrogradski, and choose our canonical variables as $Q_1 = q$, $Q_2 = \dot q$, which leads to \begin{align} P_1 &= \frac{\partial L}{\partial \dot q} - \frac{d}{dt}\frac{\partial L}{\partial \ddot q}, \\ P_2 &= \frac{\partial L}{\partial \ddot q}. \end{align} Note that the non-degeneracy allows $\ddot q$ to be expressed in terms of $Q_1$, $Q_2$ and $P_2$ through the second equation, and the first one is only necessary to define $q^{(3)}$.

We can then proceed in the usual fashion, and find the Hamiltonian through a Legendre transform: \begin{align} H &= \sum_i P_i \dot{Q}_i - L \\ &= P_1 Q_2 + P_2 \ddot{q}\left(Q_1, Q_2, P_2\right) - L\left(Q_1, Q_2,\ddot{q}\right). \end{align} Again, as usual, we can take time derivative of the Hamiltonian to find that it is time independent if the Lagrangian does not depend on time explicitly, and thus can be identified as the energy of the system.

However, we now have a problem: $H$ has only a linear dependence on $P_1$, and so can be arbitrarily negative. In an interacting system this means that we can excite positive energy modes by transferring energy from the negative energy modes, and in doing so we would increase the entropy — there would simply be more particles, and so a need to put them somewhere. Thus such a system could never reach equilibrium, exploding instantly in an orgy of particle creation. This problem is in fact completely general, and applies to even higher derivatives in a similar fashion.

Excellent question, and one that I've never really found a completely satisfactory answer for. But consider this: in elementary classical mechanics, one of the fundamental laws is Newton's second law, $\mathbf{F} = m\mathbf{a}$, which relates the force on an object to the object's acceleration. Now, most forces are exerted by one particular object on another particular object, and the value of the force depends only on the positions of the source and "target" objects. In conjunction with Newton's second law, this means that, in a classical system with $N$ objects, each one obeys an equation of the form

$$\ddot{\mathbf{x}}_i = \mathbf{f}(\{\mathbf{x}_j|j\in 1,\ldots,N\})$$

where $\mathbf{f}$ is some vector-valued function. The point of this equation is that, if you have the positions of all the objects, you can compute the accelerations of all the objects.

By taking the derivative of that equation, you get

$${\dddot{\mathbf{x}}}_i = \mathbf{f'}(\{\mathbf{x}_j\})\{\dot{\mathbf{x}}_j\}$$

(I'm getting quite loose with the notation here ;p) This allows you to compute the jerk (third derivative) using the positions and velocities. And you can repeat this procedure to get a formula (at least in some abstract sense) for any higher derivative. To put it in simple terms, since Newton's second law relates functions which are two orders of derivative apart, you only need the 0th and 1st derivatives, position and velocity, to "bootstrap" the process, after which you can compute any higher derivative you want, and from that any physical quantity. This is analogous to (and in fact closely related to) the fact that to solve a second-order differential equation, you only need two initial conditions, one for the value of the function and one for its derivative.

The story gets more complicated in other branches of physics, but still, if you look at most of them you will find that the fundamental evolution equation relates the value of some function to its first and second derivatives, but no higher. For example, in quantum mechanics you have the Schrodinger equation,

$$i\hbar\frac{\partial\Psi}{\partial t} = -\frac{\hbar^2}{2m}\frac{\partial^2 \Psi}{\partial x^2} + U(x)\Psi$$

or in quantum field theory, the Klein-Gordon equation,

$$-\frac{\partial^2\phi}{\partial t^2} + \frac{\partial^2\phi}{\partial x^2} - m^2\phi = 0$$

and others, or Maxwell's equations (equivalently, the wave equation that can be derived from them) in classical electromagnetism. In each case, you can use a similar argument to at least motivate the fact that only position or its equivalent field and its first derivative are enough to specify the entire state of the system.

Of course, you might still wonder why the equations that describe the universe relate functions that are only two derivatives apart, rather than three or four. That part is a mystery, but one that falls in the realm of philosophy rather than physics.

There are implications for causality when a equation of motion contains higher than second derivatives of the fields, EM radiation from charged bodies goes over the derivative of the acceleration

i don't know the details of WHY but this book should give more details: (Causality and Dispersion Relations) http://books.google.com/books?id=QDzHqxE4anEC&lpg=PP1&dq=causality%20dispersion%20relations&pg=PP1#v=onepage&q&f=false