Overview
Syllabus
Intro
Birds-eye view of RL
Illustrative application: RL in personal health
General thrust
Direction: Exploiting structure in RL
Vignette: Q-learning with low rank structure
Vignette: Model-free versus model-based method
Estimate dynamics or value functions for LQR? - Linear state space model with quadratic reward function
Performance of LSTD versus model-based metho
Direction: Exploration/exploitation beyond bandi
Vignette: Q-learning with UCB
Vignette: UCB and Monte Carlo Tree Search
Direction: From worst-case to instance-optimalit
Vignette: Instance-optimality of TD learning?
Instance-optimality in policy evaluation
Direction: RL in offline settings and causal inferen
Some future directions exploiting methods from cal inferences instrumental variables propensity score, doubly robust methods, synthetic controls
Taught by
Simons Institute