Completed
Proof ideas
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Questions of interest
- 3 Main challenges
- 4 MDP Preliminaries
- 5 Policy parameterizations
- 6 Policy gradient algorithm
- 7 Policy gradient example: Softmax parameterization
- 8 Entropy regularization
- 9 Convergence of Entropy regularized PG
- 10 A natural solution
- 11 Proof ideas
- 12 Restricted parameterizations
- 13 A closer look at Natural Policy Gradient • NPG performs the update
- 14 Assumptions on policies
- 15 Extension to finite samples
- 16 Looking ahead