Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization

Simons Institute via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 31-minute lecture by Qi Lei from Princeton University on optimal gradient-based algorithms for non-concave bandit optimization. Delve into advanced topics including the stochastic bandit eigenvector problem, noisy power methods, and stochastic low-rank linear rewards. Examine information theoretical understanding, regret comparisons for quadratic rewards, and higher-order polynomial bandit problems. Learn about optimal regret for different types of reward functions and potential extensions to reinforcement learning in simulator settings. Gain insights into cutting-edge research in sampling algorithms and geometries on probability distributions presented at the Simons Institute.

Syllabus

Intro
Bandit Problem
Our focus: beyond linearity and concavity
Problem li the Stochastic Bandit Eigenvector Problem
Some related work
Information theoretical understanding
Beyond cubic dimension dependence
Our methodnoisy power method
Problem i Stochastic Low-rank linear reward
Our algorithm: noisy subspace iteration
Regret comparisons: quadratic reward
Higher-order problems
Problem : Symmetric High-order Polynomial bandit
Problem IV: Asymmetric High-order Polynomial bandit
Lower bound: Optimal dependence on a
Overall Regret Comparisons
Extension to RL in simulator setting
Conclusions We find optimal regret for different types of reward function
Future directions

Taught by

Simons Institute

Reviews

Start your review of Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.