The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in policy iteration strategies. In such architectures, the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methods that allows us to correctly update the weights of the underlying functional approximator. The above-mentioned architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of reinforcement learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.

Sassano, M., Mylvaganam, T., Astolfi, A. (2023). Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 68(5), 2683-2698 [10.1109/TAC.2022.3199211].

Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics

Sassano, M;Astolfi, A
2023-01-01

Abstract

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in policy iteration strategies. In such architectures, the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methods that allows us to correctly update the weights of the underlying functional approximator. The above-mentioned architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of reinforcement learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.
2023
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore ING-INF/04
English
Optimal control
Trajectory
Manifolds
Nonlinear systems
Iterative methods
Adaptation models
Numerical models
Iterative learning methods
nonlinear systems
optimal control
Sassano, M., Mylvaganam, T., Astolfi, A. (2023). Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 68(5), 2683-2698 [10.1109/TAC.2022.3199211].
Sassano, M; Mylvaganam, T; Astolfi, A
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/338026
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact