A data-driven strategy to estimate the optimal feedback and the value function in an infinite-horizon, continuous-time, linear-quadratic optimal control problem for an unknown system is proposed. The method permits the construction of the optimal policy without any knowledge of the model, without requiring that the time derivatives of the state are available for the design, and without even assuming that an initial stabilizing feedback policy is available. Two alternative architectures are discussed: the first scheme revolves around the periodic computation of some matrix inversions involving the Q-function, whereas the second approach relies on a purely continuous-time implementation of some dynamic systems whose trajectories are uniformly attracted by the solutions to the above algebraic equations. Interestingly, the proposed strategy essentially constitutes a (direct) data-driven implementation of the celebrated Kleinman algorithm, hence subsuming the particularly appealing features of the latter, such as quadratic monotone convergence to the optimal solution. The theory is then validated by the means of practically motivated applications.

Possieri, C., Sassano, M. (2022). Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. SYSTEMS, 1-11 [10.1109/TSMC.2022.3145693].

Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm

Possieri C.;Sassano M.
2022-01-01

Abstract

A data-driven strategy to estimate the optimal feedback and the value function in an infinite-horizon, continuous-time, linear-quadratic optimal control problem for an unknown system is proposed. The method permits the construction of the optimal policy without any knowledge of the model, without requiring that the time derivatives of the state are available for the design, and without even assuming that an initial stabilizing feedback policy is available. Two alternative architectures are discussed: the first scheme revolves around the periodic computation of some matrix inversions involving the Q-function, whereas the second approach relies on a purely continuous-time implementation of some dynamic systems whose trajectories are uniformly attracted by the solutions to the above algebraic equations. Interestingly, the proposed strategy essentially constitutes a (direct) data-driven implementation of the celebrated Kleinman algorithm, hence subsuming the particularly appealing features of the latter, such as quadratic monotone convergence to the optimal solution. The theory is then validated by the means of practically motivated applications.
2022
Online ahead of print
Rilevanza internazionale
Articolo
Esperti anonimi
Settore ING-INF/04 - AUTOMATICA
English
Convergence
Costs
Linear systems
Optimal control
optimal control
Q-learning
reinforcement learning
Riccati equations
Symmetric matrices
Trajectory
uncertain/unknown systems
Possieri, C., Sassano, M. (2022). Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. SYSTEMS, 1-11 [10.1109/TSMC.2022.3145693].
Possieri, C; Sassano, M
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/294506
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact