Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis

Overview

Learn about infinite horizon average reward Markov Decision Process (MDP) in this comprehensive lecture by Prof. Vaneet Aggarwal from Purdue University. Explore novel approaches to regret guarantees with general parameterization, focusing specifically on policy gradient-based algorithms. Understand the fundamental principles of gradient estimation techniques that achieve a regret bound of O(T^0.75), and discover an efficient momentum-based approach reaching O(T^0.5). Examine innovative methods for reducing mixing time dependency in MDP problems. The speaker, a distinguished professor at Purdue University, brings extensive expertise in Reinforcement Learning, Generative AI, and Quantum Machine Learning, with notable achievements including the 2024 IEEE William R. Bennett Prize and the 2017 Jack Neubauer Memorial Award.

Syllabus

Time: 5:00– PM

Taught by

Centre for Networked Intelligence, IISc

Reviews

Start your review of Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis

Taught by

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Adaptive Multi-armed Bandit Algorithms for Markovian and IID Rewards

Sample-Efficient Constrained Reinforcement Learning with General Parameterized Policies

Online Learning in Markov Decision Processes - Part 2

Faster Saddle-Point Optimization for Solving Large-Scale Markov Decision Processes

Global Guarantees for Policy Gradient Methods

Never Stop Learning.