Overview
Syllabus
Intro
Bandit Problem
Our focus: beyond linearity and concavity
Problem li the Stochastic Bandit Eigenvector Problem
Some related work
Information theoretical understanding
Beyond cubic dimension dependence
Our methodnoisy power method
Problem i Stochastic Low-rank linear reward
Our algorithm: noisy subspace iteration
Regret comparisons: quadratic reward
Higher-order problems
Problem : Symmetric High-order Polynomial bandit
Problem IV: Asymmetric High-order Polynomial bandit
Lower bound: Optimal dependence on a
Overall Regret Comparisons
Extension to RL in simulator setting
Conclusions We find optimal regret for different types of reward function
Future directions
Taught by
Simons Institute