Overview
Syllabus
Intro
RL beyond simulated environments?
Tuning the Swiss Free Electron Laser [with Kirschner, Muty, Hiller, Ischebeck et al.]
Challenge: Safety Constraints
Safe optimization
Safe Bayesian optimization
Illustration of Gaussian Process Inference [cf, Rasmussen & Williams 2006]
Plausible maximizers
Certifying Safety
Confidence intervals for GPS?
Online tuning of 24 parameters
Shortcomings of Safe Opt
Safe learning for dynamical systems Koller, Berkenkamp, Turchetta, K CDC 18, 19
Stylized task
Planning with confidence bounds Koller, Berkenkamp, Turchetta, K CDC 18, 19
Forwards-propagating uncertain, nonlinear dynamics
Challenges with long-term action dependencies
Safe learning-based MPC
Experimental illustration
Scaling up: Efficient Optimistic Exploration in Deep Model based Reinforcement Learning
Optimism in Model-based Deep RL
Deep Model-based RL with Confidence: H-UCRL [Curi, Berkenkamp, K, Neurips 20]
Illustration on Inverted Pendulum
Deep RL: Mujoco Half-Cheetah
Action penalty effect
What about safety?
Safety-Gym Benchmark Suite
Which priors to choose? → PAC-Bayesian Meta Learning [Rothfuss, Fortuin, Josifoski, K, ICML 2021]
Experiments - Predictive accuracy (Regression)
Meta-Learned Priors for Bayesian Optimization
Meta-Learned Priors for Sequential Decision Making
Safe and efficient exploration in real-world RL
Acknowledgments
Taught by
Fields Institute