Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control

dc.authoridTutsoy, Onder/0000-0001-6385-3025
dc.contributor.authorTutsoy, Önder
dc.contributor.authorBrown, Martin
dc.date.accessioned2025-01-06T17:36:32Z
dc.date.available2025-01-06T17:36:32Z
dc.date.issued2016
dc.description.abstractReinforcement learning is a powerful tool used to obtain optimal control solutions for complex and difficult sequential decision making problems where only a minimal amount of a priori knowledge exists about the system dynamics. As such, it has also been used as a model of cognitive learning in humans and applied to systems, such as humanoid robots, to study embodied cognition. In this paper, a different approach is taken where a simple test problem is used to investigate issues associated with the value function's representation and parametric convergence. In particular, the terminal convergence problem is analyzed with a known optimal control policy where the aim is to accurately learn the value function. For certain initial conditions, the value function is explicitly calculated and it is shown to have a polynomial form. It is parameterized by terms that are functions of the unknown plant's parameters and the value function's discount factor, and their convergence properties are analyzed. It is shown that the temporal difference error introduces a null space associated with the finite horizon basis function during the experiment. The learning problem is only non-singular when the experiment termination is handled correctly and a number of (equivalent) solutions are described. Finally, it is demonstrated that, in general, the test problem's dynamics are chaotic for random initial states and this causes digital offset in the value function learning. The offset is calculated, and a dead zone is defined to switch off learning in the chaotic region. Copyright (C) 2015 John Wiley & Sons, Ltd.
dc.description.sponsorshipTurkish Ministry of National Education
dc.description.sponsorshipThis research was supported by The Turkish Ministry of National Education.
dc.identifier.doi10.1002/oca.2156
dc.identifier.endpage126
dc.identifier.issn0143-2087
dc.identifier.issn1099-1514
dc.identifier.issue1
dc.identifier.scopus2-s2.0-84954386178
dc.identifier.scopusqualityQ1
dc.identifier.startpage108
dc.identifier.urihttps://doi.org/10.1002/oca.2156
dc.identifier.urihttps://hdl.handle.net/20.500.14669/1917
dc.identifier.volume37
dc.identifier.wosWOS:000372643700005
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherWiley
dc.relation.ispartofOptimal Control Applications & Methods
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241211
dc.subjectbadly conditioned learning
dc.subjectpolynomial basis functions
dc.subjectrate of convergence
dc.subjecttemporal difference learning
dc.subjectvalue function approximation
dc.titleChaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
dc.typeArticle

Dosyalar