Intrinsic motivation mechanisms for a better sample efficiency in deep reinforcement learning applied to scenarios with sparse rewards

Alain Andrés Fernández

Ayuda

Intrinsic motivation mechanisms for a better sample efficiency in deep reinforcement learning applied to scenarios with sparse rewards

Autores: Alain Andrés Fernández
Directores de la Tesis: Esther Villar Rodríguez (dir. tes.) , Javier del Ser Lorente (dir. tes.)
Lectura: En la Universidad del País Vasco - Euskal Herriko Unibertsitatea ( España ) en 2023
Idioma: inglés
Enlaces
- Tesis en acceso abierto en: ADDI
Resumen
- Driven by the quest to create intelligent systems that can autonomously learn to make optimal decisions, Reinforcement Learning has emerged as a powerful branch of Machine Learning. Reinforcement Learning agents interact with their environment, learning from trial and error, guided by feedback signals shaped in the form of rewards. However, the application of Reinforcement Learning is often hampered by the complexity associated with the design of such rewards. Creating a dense reward function, where the agent receives immediate and frequent feedback from its actions, is often a challenging task. This challenge arises from the difficulty of specifying the correct behavior for every possible state-action pair. This issue parallels the challenges faced in human learning where educators often grapple with identifying the best way to teach a certain skill or subject, given that learning styles can vary dramatically among individuals. As a consequence, it is common to formulate the problems with sparse rewards, where the agent is only rewarded when it accomplishes a significant task or achieves the final goal, thus aligning more directly with the objective of the problem. The sparse reward formulation does not require the anticipation of every possible scenario or state, making it more tractable for complex environments and real-world scenarios, where feedback is often delayed and not immediately available.However, sparse reward settings also introduce their own challenges, most notably, the issue of exploration. In the absence of frequent rewards, an agent can struggle to identify beneficial actions, making learning slow and inefficient. This is where mechanisms such as Intrinsic Motivation come into play, encouraging more effective exploration and improving sample efficiency, despite the sparsity of extrinsic rewards.In this context, the overall contribution of this Thesis is to delve into how Intrinsic Motivation can boost the performance of Deep Reinforcement Learning approaches in environments with sparse rewards, aiming to enhance their sample efficiency. To this end, we first stress on its application with concurrent heterogeneous agents, aiming to establish a collaborative framework to make them explore more efficiently and accelerate their learning process. Furthermore, an entire chapter is devoted to analyzing and discussing the impact of certain design choices and parameter settings on the generation of the Intrinsic Motivation bonuses. Last but not least, the Thesis proposes to combine these explorative techniques with Self-Imitation Learning, demonstrating that they can be used jointly towards achieving faster convergence and optimal policies.All the analyzed scenarios suggest that Intrinsic Motivation can significantly speed up learning, reducing the number of interactions an agent needs to perform, and ultimately, leading to more rapid and efficient problem-solving in complex environments characterized by sparse rewards.