Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Towards Structural Risk Minimization for RL - Emma Brunskill
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Learning through Experience...
- 3 Why is Risk Sensitive Control Important?
- 4 Risk Sensitive Reinforcement Learning
- 5 Notation: Markov Decision Process Value Function
- 6 Notation: Reinforcement Learning
- 7 Background: Distributional RL for Policy Evaluation & Control
- 8 Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
- 9 Maximal Form of Wasserstein Metric on 2 Distributions
- 10 Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
- 11 Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
- 12 Conditional Value at Risk for a Decision Policy
- 13 For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
- 14 Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
- 15 Suggests a Path for Sample Efficient Risk Sensitive RL
- 16 Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
- 17 Creating an Optimistic Estimate of Distribution of Returns
- 18 Optimism Operator Over CDF of Returns
- 19 Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
- 20 Concerns about Optimistic Risk Sensitive RL
- 21 Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
- 22 Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
- 23 Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
- 24 Simulation Experiments
- 25 Baseline Algorithms
- 26 Simulation Domains
- 27 Machine Replacement, Risk level a = 0.25
- 28 HIV Treatment
- 29 Blood Glucose Simulator, Adult #5
- 30 Blood Glucose Simulator, 3 Patients
- 31 A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
- 32 Many Interesting Open Directions
- 33 Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies