ADSI Summer Workshop: Algorithmic Foundations of Learning and Control - Emma Brunskill

Overview

Explore a comprehensive lecture on batch and counterfactual reinforcement learning presented by Emma Brunskill from Stanford University at the 2019 ADSI Summer Workshop on Algorithmic Foundations of Learning and Control. Delve into advanced techniques for minimizing and understanding data requirements in decision-making processes, addressing challenges like covariate shift, and examining batch policy optimization with generalization bounds. Investigate the legacy of reinforcement learning in benefiting people, analyze importance sampling for RL batch policy evaluation, and discover the first proof of convergence to local optima for batch policy gradient. Examine experimental settings using HIV treatment simulators, explore strong generalization guarantees on policy performance, and study linear thresholding policies. Gain insights into advantage decomposition and the Advantage Doubly Robust (ADR) Estimator while pursuing the quest for batch policy optimization with generalization guarantees.

Syllabus

Intro
Legacy of Reinforcement Learning to Benefit People
Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions
Challenge: Covariate Shift Different Policies-- Different Actions - Different State Distributions
Quest: Batch Policy Optimization w/ Generalization Bounds
Recall: Importance Sampling for RL Batch Policy Evaluation
1st Proof of Convergence to a Local Optima for Batch Policy Gradient
Experiment Settings
HIV treatment simulator
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
Example: Linear Thresholding Policies
An Advantage Decomposition
Advantage Doubly Robust (ADR) Estimator
Quest for Batch Policy Optimization with Generalization Guarantees

Taught by

Paul G. Allen School

Reviews

Start your review of ADSI Summer Workshop: Algorithmic Foundations of Learning and Control - Emma Brunskill

Taught by

Better Learning from the Past - Counterfactual - Batch RL

Batch Offline Reinforcement Learning - Part 1

Never Stop Learning.