Trust Region & Proximal Policy Optimization

Trust Region & Proximal Policy Optimization

Pascal Poupart via YouTube Direct link

Empirical Results

12 of 13

12 of 13

Empirical Results

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Trust Region & Proximal Policy Optimization

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Gradient policy optimization
  2. 2 Recall Policy Gradient
  3. 3 Trust region method
  4. 4 Trust region for policies
  5. 5 Kullback-Leibler Divergence
  6. 6 Reformulation
  7. 7 Derivation (continued)
  8. 8 Trust Region Policy Optimization (TRPO) TRPOO Initialize sa to anything Loop forever (for each episode)
  9. 9 Constrained Optimization
  10. 10 Simpler Objective
  11. 11 Proximal Policy Optimization (PPO)
  12. 12 Empirical Results
  13. 13 Illustration

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.