Completed
Kullback-Leibler Divergence
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Trust Region & Proximal Policy Optimization
Automatically move to the next video in the Classroom when playback concludes
- 1 Gradient policy optimization
- 2 Recall Policy Gradient
- 3 Trust region method
- 4 Trust region for policies
- 5 Kullback-Leibler Divergence
- 6 Reformulation
- 7 Derivation (continued)
- 8 Trust Region Policy Optimization (TRPO) TRPOO Initialize sa to anything Loop forever (for each episode)
- 9 Constrained Optimization
- 10 Simpler Objective
- 11 Proximal Policy Optimization (PPO)
- 12 Empirical Results
- 13 Illustration