Towards Structural Risk Minimization for RL - Emma Brunskill

Towards Structural Risk Minimization for RL - Emma Brunskill

Institute for Advanced Study via YouTube Direct link

Maximal Form of Wasserstein Metric on 2 Distributions

9 of 33

9 of 33

Maximal Form of Wasserstein Metric on 2 Distributions

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Towards Structural Risk Minimization for RL - Emma Brunskill

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Learning through Experience...
  3. 3 Why is Risk Sensitive Control Important?
  4. 4 Risk Sensitive Reinforcement Learning
  5. 5 Notation: Markov Decision Process Value Function
  6. 6 Notation: Reinforcement Learning
  7. 7 Background: Distributional RL for Policy Evaluation & Control
  8. 8 Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
  9. 9 Maximal Form of Wasserstein Metric on 2 Distributions
  10. 10 Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
  11. 11 Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
  12. 12 Conditional Value at Risk for a Decision Policy
  13. 13 For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
  14. 14 Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
  15. 15 Suggests a Path for Sample Efficient Risk Sensitive RL
  16. 16 Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
  17. 17 Creating an Optimistic Estimate of Distribution of Returns
  18. 18 Optimism Operator Over CDF of Returns
  19. 19 Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
  20. 20 Concerns about Optimistic Risk Sensitive RL
  21. 21 Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
  22. 22 Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
  23. 23 Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
  24. 24 Simulation Experiments
  25. 25 Baseline Algorithms
  26. 26 Simulation Domains
  27. 27 Machine Replacement, Risk level a = 0.25
  28. 28 HIV Treatment
  29. 29 Blood Glucose Simulator, Adult #5
  30. 30 Blood Glucose Simulator, 3 Patients
  31. 31 A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
  32. 32 Many Interesting Open Directions
  33. 33 Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.