Steepest descent/gradient descent as dynamical system

This topic has long history. Here are some references:

  1. Bloch, Anthony M. "Steepest descent, linear programming and Hamiltonian flows." Contemp. Math. AMS 114 (1990): 77-88.

  2. Brockett, Roger W. Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. Decision and Control, 1988., Proceedings of the 27th IEEE Conference on. IEEE, 1988.

  3. Helmke, Uwe, and John B. Moore. Optimization and Dynamical Systems. Springer Science & Business Media, 2012.

Also, there are plenty of physically relevant PDEs which can be seen as implementing gradient descent in some Banach space. For example, see

  1. Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.

  2. Terry Tao, The Euler-Arnold equation, June 2010.


I suggest looking at the function to be optimized as a local Lyapunov function for the dynamical system defined by the search procedure. There must be some literature on this point of view, but my knowledge is limited.