Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis

Centre for Networked Intelligence, IISc via YouTube

Overview

Learn about infinite horizon average reward Markov Decision Process (MDP) in this comprehensive lecture by Prof. Vaneet Aggarwal from Purdue University. Explore novel approaches to regret guarantees with general parameterization, focusing specifically on policy gradient-based algorithms. Understand the fundamental principles of gradient estimation techniques that achieve a regret bound of O(T^0.75), and discover an efficient momentum-based approach reaching O(T^0.5). Examine innovative methods for reducing mixing time dependency in MDP problems. The speaker, a distinguished professor at Purdue University, brings extensive expertise in Reinforcement Learning, Generative AI, and Quantum Machine Learning, with notable achievements including the 2024 IEEE William R. Bennett Prize and the 2017 Jack Neubauer Memorial Award.

Syllabus

Time: 5:00– PM

Taught by

Centre for Networked Intelligence, IISc

Reviews

Start your review of Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.