On the Statistical Complexity of Reinforcement Learning

Overview

Explore the statistical complexities of reinforcement learning in this 47-minute lecture by Mengdi Wang from Princeton University. Delve into key theoretical questions surrounding RL, including sample complexity, regret analysis, and off-policy evaluation. Examine recent findings on minimax-optimal sample complexities for solving Markov Decision Processes, optimal off-policy evaluation through regression, and regret bounds for online RL with nonparametric model estimation. Gain insights into tabular MDPs, state feature mapping, Bellman equation reduction, and episodic reinforcement learning. Understand the importance of feature space embedding of transition kernels and exploration techniques like Value-Targeted Regression. This talk, part of the Intersections between Control, Learning and Optimization 2020 series at the Institute for Pure & Applied Mathematics, offers a comprehensive overview of recent advancements in the theoretical foundations of reinforcement learning.

Syllabus

Intro
Tabular Markov decision process
Prior efforts: algorithms and sample complexity results
Minimax optimal sample complexity of tabular MDP
Adding some structure: state feature map
Representing value function using linear combination of features
Rethinking Bellman equation
Reducing Bellman equation using features
Sample complexity of RL with features
Of-Policy Policy Evaluation (OPE)
OPE with function approximation
Equivalence to plug-in estimation
Minimax-optimal batch policy evaluation
Lower Bound Analysis
Episodic Reinforcement Learning
Feature space embedding of transition kernel
Regret Analysis
Exploration with Value-Targeted Regression VTAL