Better Learning from the Past - Counterfactual - Batch RL

Better Learning from the Past - Counterfactual - Batch RL

Simons Institute via YouTube Direct link

Background: Markov Decision Process Value Function

4 of 19

4 of 19

Background: Markov Decision Process Value Function

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Better Learning from the Past - Counterfactual - Batch RL

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Sequential Decision Making Under Uncertainty
  3. 3 Learning to Make Good Sequences of Decisions Under Uncertainty → 1980s Reinforcement Learning
  4. 4 Background: Markov Decision Process Value Function
  5. 5 Background: Reinforcement Learning
  6. 6 Counterfactual / Batch Off Policy Reinforcement Learning
  7. 7 Need for Generalization
  8. 8 Growing Interest in Causal Inference & ML
  9. 9 Batch / Counterfactual Policy Optimization: Pick Policy w/Best Estimated Expected Sum of Rewards
  10. 10 Quest: Batch Policy Optimization w/ Generalization Bounds
  11. 11 Challenge: Good Error Bound Analysis
  12. 12 Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
  13. 13 Off-Policy Policy Gradient with State Distribution Correction
  14. 14 Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
  15. 15 Example: Linear Thresholding Policies Starting HIV treatment as soon as
  16. 16 Use an Advantage Decomposition
  17. 17 Use a Doubly Robust Advantage Decomposition
  18. 18 Quest for Batch Policy Optimization with Generalization Guarantees
  19. 19 Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.