Overview
Syllabus
Intro
Reinforcement Learning (RL) Applications
Value-function Approximation
Comparison between SL and RL
Markov Decision Process (MDP)
Batch learning in MDPS
Example: Video game playing
Batch learning in large MDPS
Assumption on data (?)
Assumption on data & MDP dynamics
Algorithm for batch RL
How things go wrong (w/ restricted class)
Fix using a strong assumption ("completeness")
Realizability alone is insufficient?
Proving the conjecture: Attempt 1
Checklist for a plausible construction
Importance of the conjecture
Importance of the construction
Taught by
Simons Institute