Find a controller for a control system with a cubic state that minimizes an integral function

When the dynamics is nonlinear (or the cost function in the integral is not just quadratic) a good starting point would be Pontryagin's maximum (or minimum) principle (PMP), which is a Hamiltonian approach. The general optimal control problem that PMP can solve is of the following form

$$ \min_{u(t)} \int_0^T g(t, x(t), u(t))\,dt + g_T(x(T)), \tag{1} $$

with

$$ \dot{x} = f(t, x(t), u(t)), \quad x(0) = x_0. \tag{2} $$

PMP states that this problem can be solved with the Hamiltonian

$$ H(t, x(t), u(t), \lambda(t)) = \lambda(t)^\top f(t, x(t), u(t)) + g(t, x(t), u(t)), \tag{3} $$

where $\lambda(t)$ is called the co-state and has the same dimension as $x(t)$. The dynamics of the system to which the optimal control law is applied to can be expressed as using

$$ \dot{\lambda}(t) = -\left[\frac{\partial}{\partial\,x} H(t, x(t), u(t), \lambda(t))\right]^\top, \tag{4} $$

$$ \dot{x}(t) = \left[\frac{\partial}{\partial\,\lambda} H(t, x(t), u(t), \lambda(t))\right]^\top, \tag{5} $$

$$ 0 = \frac{\partial}{\partial\,u} H(t, x(t), u(t), \lambda(t)). \tag{6} $$

It can be noted that after substitution of $(3)$ into $(5)$ you get $(2)$ back. If the input $u(t)$ is constrained $(6)$ can be replaced with solving for $u(t)$ such that the Hamiltonian is minimized, but if $u(t)$ is not constrained and the Hamiltonian is convex in $u(t)$ this is equivalent to $(6)$. If some (or all) of the components of the state are constrained at the terminal time, $x_i(T) = c_i$, the variables $\lambda_i(T)$ can be chosen freely. But if $x_j(T)$ is free $\lambda_j(T)$ will be constrained, which is defined by

$$ \lambda_j(T) = \left[\frac{\partial\,g_T(x)}{\partial\,x}\right]^\top_{x = x(T)}. \tag{7} $$


In your case $f(t,x,u) = -x^3 + u$, $g(t,x,u) = x^2 + u^2$ and $g_T(x(T)) = 0$ which give the following Hamiltonian when using $(3)$

$$ H(x, u, \lambda) = \lambda(t)\,(-x^3 + u) + x^2 + u^2. \tag{8} $$

Plugging this into $(4)$ gives

$$ \dot{\lambda} = x\,(3\,\lambda\,x - 2). \tag{9} $$

Since there are no constraints for $u$, thus $(5)$ can be used

$$ \frac{\partial\,H}{\partial\,u} = 2\,u + \lambda = 0, \tag{10} $$

which yields $u = -\tfrac{1}{2}\lambda$. Plugging this into $f(t,x,u)$ allows use to express the total system dynamics as

\begin{align} \dot{x} &= -x^3 - \frac{1}{2} \lambda, \tag{11a} \\ \dot{\lambda} &= x\,(3\,\lambda\,x - 2), \tag{11b} \end{align}

with $\lambda(T=\infty) = 0$, since $g_T = 0$. Here is where the limitations of PMP come in, namely PMP does not provide a general way through which one can easily solve for $\lambda(0)$ and PMP is normally also not ideal when dealing with $T=\infty$. In order for $\lambda(\infty) = 0$ it also must be true that $\dot{\lambda}(\infty) = 0$, which combined with $(11b)$ also imply $x(\infty) = 0$ (which also makes sense when considering the cost function).

In this case I found it easier to perform a coordinate transformation, namely by differentiation $\dot{x}$ another time and expressing $\lambda$ as a function of $x$ and $\dot{x}$. By doing this and simplifying the expressions it can be shown to yield

$$ \ddot{x} = x + 3\,x^5, \tag{12} $$

$$ \lambda = -2\left(\dot{x} + x^3\right). \tag{13} $$

So one now instead has to find $\dot{x}(0)$ such that $x(\infty)=0$. It can be noted that $(12)$ is a second order differential equation which is only dependent on the position, for which one can define potential energy as minus the "force" integrated over $x$. The total energy of the system can therefore shown to be equal to

$$ E = \frac{1}{2} \dot{x}^2 - \frac{1}{2} x^2 (1 + x^4). \tag{14} $$

If the system would go to the origin ($x(\infty)=0$ and $\dot{x}(\infty)=0$) would require that the energy in the system would be zero $E=0$. Solving $(14)$ set to zero for $\dot{x}$ gives

$$ \dot{x} = \pm x\sqrt{1 + x^4}. \tag{15} $$

When determining the sign of $\dot{x}$ one can reason that the system should move towards the origin, so if $x$ is positive $\dot{x}$ should be negative and vice versa, so the plus minus sign should always be a minus sign. This can now be used to find the initial condition for the co-state by combining $(13)$ with $(15)$

$$ \lambda(0) = -2\left(-x(0)\sqrt{1 + x(0)^4} + x(0)^3\right). \tag{16} $$

However $(12)$ and thus $(11)$ is unstable, so tiny deviations in initial conditions can eventually lead to that the system would blow up. Therefore it is often better give a control policy as a function of the current state instead of only the initial state. This can be done by not only evaluating $(16)$ at $t=0$ but also at all following times, substituting this into the solution for $u$ gives the following optimal control policy

$$ u = -x\sqrt{1 + x^4} + x^3. \tag{17} $$


An alternative approach to this problem is achieved by using dynamic programming and the Hamilton-Jacobi-Bellman (HJB) equation.

Our problem is given as an unconstrained ($u\in \mathbb{R}$) optimal control problem with an infinite horizon (the final time $t_\text{f}$ is infinite).

Dynamic programming states the problem unconstrained infinite horizon problem as

$$\text{min: } J = \dfrac{1}{2}\int_{t_0}^{\infty}\mathcal{L}(t,x,u)dt$$ $$\text{s.t.: } \dot{x}=f(t,x,u), x(t_0)=x_0.$$

  1. Step: The solution can be obtained by solving (unconstrained optimization)

$$\dfrac{d}{du}\left[\dfrac{1}{2}\mathcal{L}+\lambda f(t,x,u)\right]=0$$

for $u$ depending on $\lambda$.

  1. Step: Plugging $u(\lambda)$ this into HJB for infinite horizon

$$\dfrac{1}{2}\mathcal{L}+\lambda f(t,x,u)=0.$$

  1. Step: Solve for $\lambda(x)$ Then $u(\lambda(x))=u(x)$ is the optimal control.

Applied to this problem:

  1. Step: Determine $u(\lambda)$ $$\dfrac{d}{du}\left[ \dfrac{1}{2}(x^2+u^2)+\lambda(-x^3+u)\right]=0 \implies u + \lambda = 0 \implies u = -\lambda.$$

  2. Step: Determine $\lambda(x)$ $$\dfrac{1}{2}(x^2+u^2)+\lambda(-x^3+u)=0$$ $$\dfrac{1}{2}(x^2+\lambda^2)+\lambda(-x^3-\lambda)=0$$ $$\lambda^2+2x^3\lambda-x^2=0 \implies \lambda_{1,2}=-x^3\pm \sqrt{x^6+x^2}$$

  3. Step: Determine $u(\lambda(x))=u(x)$ $$u=-\lambda \implies u(x) = x^3\mp \sqrt{x^6+x^2}$$

We have two solutions it turns out that $u(x) = x^3- \sqrt{x^6+x^2}$ is the correct solution, because $u(x) = x^3+\sqrt{x^6+x^2}$ will lead to unstable behavior. This instability can be shown by $V(x) = \dfrac{1}{2}x^2$ as a Lyapunov candidate function. The solution with the minus sign is globally asymptotically stable by the same Lyapunov candidate function.


This method does not force $u$ to be of any type. You can also try the same procedure with other powers of $u$.