Overview
Syllabus
Intro
The Basic Bandit Game
Bandits are Super Simple MDP
The Regret
Adversarial Protocol
Algorithm Design Principle: Exponential Weights
Exp3: Abridged Analysis
Exp3: Analysis
Upgrades
Warm-up: Explore-Then-Commit
Algorithm Design Principle: OFU
UCB Illustration
UCB: Analysis
Algorithm Design Principle: Probability Matching
Thompson Sampling: Overview
Thompson Sampling: Upper Bound
Thompson Sampling: Proof Outline
Best of Both Worlds
Two Settings
Algorithm Design Principle: Action Elimination
Successive Elimination Analysis
Bonus: Linear Contextual Bandits
Algorithm Design Principle: Optimism
Review
Taught by
Simons Institute