Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis
Centre for Networked Intelligence, IISc via YouTube
Overview
Learn about infinite horizon average reward Markov Decision Process (MDP) in this comprehensive lecture by Prof. Vaneet Aggarwal from Purdue University. Explore novel approaches to regret guarantees with general parameterization, focusing specifically on policy gradient-based algorithms. Understand the fundamental principles of gradient estimation techniques that achieve a regret bound of O(T^0.75), and discover an efficient momentum-based approach reaching O(T^0.5). Examine innovative methods for reducing mixing time dependency in MDP problems. The speaker, a distinguished professor at Purdue University, brings extensive expertise in Reinforcement Learning, Generative AI, and Quantum Machine Learning, with notable achievements including the 2024 IEEE William R. Bennett Prize and the 2017 Jack Neubauer Memorial Award.
Syllabus
Time: 5:00– PM
Taught by
Centre for Networked Intelligence, IISc