Safe and Efficient Exploration in Reinforcement Learning

Overview

Explore safe and efficient reinforcement learning techniques in this one-hour seminar from the Machine Learning Advances and Applications series. Delve into the challenges of applying RL beyond simulated environments, focusing on safety constraints and tuning real-world systems like the Swiss Free Electron Laser. Learn about safe Bayesian optimization, Gaussian Process Inference, and confidence intervals for certifying safety. Discover methods for safe learning in dynamical systems, including planning with confidence bounds and forwards-propagating uncertain, nonlinear dynamics. Examine scaling up efficient optimistic exploration in deep model-based RL, with illustrations on inverted pendulum and Mujoco Half-Cheetah environments. Investigate PAC-Bayesian Meta Learning for choosing priors and its applications in Bayesian optimization and sequential decision making. Gain insights into safe and efficient exploration techniques applicable to real-world reinforcement learning scenarios.

Syllabus

Intro
RL beyond simulated environments?
Tuning the Swiss Free Electron Laser [with Kirschner, Muty, Hiller, Ischebeck et al.]
Challenge: Safety Constraints
Safe optimization
Safe Bayesian optimization
Illustration of Gaussian Process Inference [cf, Rasmussen & Williams 2006]
Plausible maximizers
Certifying Safety
Confidence intervals for GPS?
Online tuning of 24 parameters
Shortcomings of Safe Opt
Safe learning for dynamical systems Koller, Berkenkamp, Turchetta, K CDC 18, 19
Stylized task
Planning with confidence bounds Koller, Berkenkamp, Turchetta, K CDC 18, 19
Forwards-propagating uncertain, nonlinear dynamics
Challenges with long-term action dependencies
Safe learning-based MPC
Experimental illustration
Scaling up: Efficient Optimistic Exploration in Deep Model based Reinforcement Learning
Optimism in Model-based Deep RL
Deep Model-based RL with Confidence: H-UCRL [Curi, Berkenkamp, K, Neurips 20]
Illustration on Inverted Pendulum
Deep RL: Mujoco Half-Cheetah
Action penalty effect
What about safety?
Safety-Gym Benchmark Suite
Which priors to choose? → PAC-Bayesian Meta Learning [Rothfuss, Fortuin, Josifoski, K, ICML 2021]
Experiments - Predictive accuracy (Regression)
Meta-Learned Priors for Bayesian Optimization
Meta-Learned Priors for Sequential Decision Making
Safe and efficient exploration in real-world RL
Acknowledgments

Taught by

Fields Institute

Reviews

Start your review of Safe and Efficient Exploration in Reinforcement Learning

Taught by

Towards Safe and Efficient Learning in the Physical World - Stanford Seminar

Never Stop Learning.