Completed
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Better Learning from the Past - Counterfactual - Batch RL
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Sequential Decision Making Under Uncertainty
- 3 Learning to Make Good Sequences of Decisions Under Uncertainty → 1980s Reinforcement Learning
- 4 Background: Markov Decision Process Value Function
- 5 Background: Reinforcement Learning
- 6 Counterfactual / Batch Off Policy Reinforcement Learning
- 7 Need for Generalization
- 8 Growing Interest in Causal Inference & ML
- 9 Batch / Counterfactual Policy Optimization: Pick Policy w/Best Estimated Expected Sum of Rewards
- 10 Quest: Batch Policy Optimization w/ Generalization Bounds
- 11 Challenge: Good Error Bound Analysis
- 12 Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
- 13 Off-Policy Policy Gradient with State Distribution Correction
- 14 Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
- 15 Example: Linear Thresholding Policies Starting HIV treatment as soon as
- 16 Use an Advantage Decomposition
- 17 Use a Doubly Robust Advantage Decomposition
- 18 Quest for Batch Policy Optimization with Generalization Guarantees
- 19 Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions