Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics

IRIS

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in policy iteration strategies. In such architectures, the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methods that allows us to correctly update the weights of the underlying functional approximator. The above-mentioned architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of reinforcement learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.

Sassano, M., Mylvaganam, T., Astolfi, A. (2023). Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 68(5), 2683-2698 [10.1109/TAC.2022.3199211].

Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics

Sassano, M;Mylvaganam, T;Astolfi, A

2023-01-01

Abstract

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in policy iteration strategies. In such architectures, the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methods that allows us to correctly update the weights of the underlying functional approximator. The above-mentioned architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of reinforcement learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
			2023
		
	Status di pubblicazione
	
			Pubblicato
		
	DOI dell'articolo
	
			https://dx.doi.org/10.1109/TAC.2022.3199211
		
	Rilevanza
	
			Rilevanza internazionale
		
	Tipo
	
			Articolo
		
	Referee
	
			Esperti anonimi
		
	Settore disciplinare dell'articolo
	
			Settore ING-INF/04
		
	Lingua del contenuto
	
			English
		
	Parole chiave
	
			Optimal control
Trajectory
Manifolds
Nonlinear systems
Iterative methods
Adaptation models
Numerical models
Iterative learning methods
nonlinear systems
optimal control
		
	Citazione
	
			Sassano, M., Mylvaganam, T., Astolfi, A. (2023). Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 68(5), 2683-2698 [10.1109/TAC.2022.3199211].
		
	Tutti gli autori
	
			Sassano, M; Mylvaganam, T; Astolfi, A
		
	Tipologia
	
			Articolo su rivista
		
	Appare nelle tipologie:
	
			01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/338026

Citazioni

ND

ND

1

social impact