Completed
Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Towards Structural Risk Minimization for RL - Emma Brunskill
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Learning through Experience...
- 3 Why is Risk Sensitive Control Important?
- 4 Risk Sensitive Reinforcement Learning
- 5 Notation: Markov Decision Process Value Function
- 6 Notation: Reinforcement Learning
- 7 Background: Distributional RL for Policy Evaluation & Control
- 8 Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
- 9 Maximal Form of Wasserstein Metric on 2 Distributions
- 10 Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
- 11 Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
- 12 Conditional Value at Risk for a Decision Policy
- 13 For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
- 14 Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
- 15 Suggests a Path for Sample Efficient Risk Sensitive RL
- 16 Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
- 17 Creating an Optimistic Estimate of Distribution of Returns
- 18 Optimism Operator Over CDF of Returns
- 19 Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
- 20 Concerns about Optimistic Risk Sensitive RL
- 21 Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
- 22 Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
- 23 Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
- 24 Simulation Experiments
- 25 Baseline Algorithms
- 26 Simulation Domains
- 27 Machine Replacement, Risk level a = 0.25
- 28 HIV Treatment
- 29 Blood Glucose Simulator, Adult #5
- 30 Blood Glucose Simulator, 3 Patients
- 31 A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
- 32 Many Interesting Open Directions
- 33 Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies