Overview
Syllabus
Intro
Sequential Decision Making Under Uncertainty
Learning to Make Good Sequences of Decisions Under Uncertainty → 1980s Reinforcement Learning
Background: Markov Decision Process Value Function
Background: Reinforcement Learning
Counterfactual / Batch Off Policy Reinforcement Learning
Need for Generalization
Growing Interest in Causal Inference & ML
Batch / Counterfactual Policy Optimization: Pick Policy w/Best Estimated Expected Sum of Rewards
Quest: Batch Policy Optimization w/ Generalization Bounds
Challenge: Good Error Bound Analysis
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
Off-Policy Policy Gradient with State Distribution Correction
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
Example: Linear Thresholding Policies Starting HIV treatment as soon as
Use an Advantage Decomposition
Use a Doubly Robust Advantage Decomposition
Quest for Batch Policy Optimization with Generalization Guarantees
Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions
Taught by
Simons Institute