Towards Structural Risk Minimization for RL - Emma Brunskill
Institute for Advanced Study via YouTube
Overview
Syllabus
Intro
Learning through Experience...
Why is Risk Sensitive Control Important?
Risk Sensitive Reinforcement Learning
Notation: Markov Decision Process Value Function
Notation: Reinforcement Learning
Background: Distributional RL for Policy Evaluation & Control
Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
Maximal Form of Wasserstein Metric on 2 Distributions
Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
Conditional Value at Risk for a Decision Policy
For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
Suggests a Path for Sample Efficient Risk Sensitive RL
Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
Creating an Optimistic Estimate of Distribution of Returns
Optimism Operator Over CDF of Returns
Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
Concerns about Optimistic Risk Sensitive RL
Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
Simulation Experiments
Baseline Algorithms
Simulation Domains
Machine Replacement, Risk level a = 0.25
HIV Treatment
Blood Glucose Simulator, Adult #5
Blood Glucose Simulator, 3 Patients
A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
Many Interesting Open Directions
Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies
Taught by
Institute for Advanced Study