Online Learning and Bandits - Part 1

Overview

Explore the fundamentals of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Delve into key concepts such as full information online learning, online gradient descent, exponential weights algorithm, and follow the regularized leader. Examine the balance between model complexity and overfitting, and discover applications in offline optimization and computing saddle points. Learn from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through working definitions, design principles, and performance analyses of various algorithms. Gain valuable insights into the theoretical foundations of reinforcement learning and their practical implications in this hour-long presentation.

Syllabus

Intro
Positioning this Tutorial
Working Definitions
Full Information Online Learning
Setup
OCO Problem
Design Principle
Online Gradient Descent (OGD) Algorithm
Online Gradient Descent Result
Proof of OGD regret bound (ctd)
OGD Discussion
From Learning Parameters to Picking Actions
Let's apply what we know
Exponential Weigths / Hedge Algorithm Algorithm: Exponential Weights (EW)
EW Analysis Applying Hoeding's Lemma to the loss of each round gives
Summary so far Balancing act "model complexity vs "overfitting
FTRL/MD "sneak peek"
FTRL/MD sneak peak performance Algorithm: Follow the Regularised Leader (FTRL)
Quadratic Losses
Curvature assumptions
ONS Algorithm
ONS Performance
ONS Discussion
Offline Optimisation
Online to Batch Assumption: stochastic setting
Computing Saddle Points
Application 3: Saddle Point Algorithm Algorithm: approximate saddle point solver
Application 3: Saddle Point Analysis
Conclusion

Taught by

Simons Institute

Reviews

Start your review of Online Learning and Bandits - Part 1

Taught by

Online Learning and Bandits - Part 2

Online Learning in Markov Decision Processes - Part 2

Never Stop Learning.