Online Learning and Bandits - Part 2

Overview

Delve into the intricacies of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Explore fundamental concepts such as the basic bandit game, regret analysis, and adversarial protocols. Learn about key algorithm design principles, including exponential weights, optimism in the face of uncertainty, and probability matching. Examine popular algorithms like Exp3, UCB, and Thompson Sampling, along with their analyses and upper bounds. Investigate advanced topics such as best of both worlds scenarios, successive elimination, and linear contextual bandits. Gain insights from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through this essential area of reinforcement learning theory.

Syllabus

Intro
The Basic Bandit Game
Bandits are Super Simple MDP
The Regret
Adversarial Protocol
Algorithm Design Principle: Exponential Weights
Exp3: Abridged Analysis
Exp3: Analysis
Upgrades
Warm-up: Explore-Then-Commit
Algorithm Design Principle: OFU
UCB Illustration
UCB: Analysis
Algorithm Design Principle: Probability Matching
Thompson Sampling: Overview
Thompson Sampling: Upper Bound
Thompson Sampling: Proof Outline
Best of Both Worlds
Two Settings
Algorithm Design Principle: Action Elimination
Successive Elimination Analysis
Bonus: Linear Contextual Bandits
Algorithm Design Principle: Optimism
Review

Taught by

Simons Institute

Reviews

Start your review of Online Learning and Bandits - Part 2

Taught by

Bandit Algorithm (Online Machine Learning)

Online Learning and Bandits - Part 1

Bayesian and Contextual Bandits

Deep Bayesian Bandits - Exploring in Online Personalized Recommendations

Reinforcement Learning - Part I

Provably Efficient Reinforcement Learning with Linear Function Approximation - Chi Jin

Never Stop Learning.