Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization

Overview

Explore a 31-minute lecture by Qi Lei from Princeton University on optimal gradient-based algorithms for non-concave bandit optimization. Delve into advanced topics including the stochastic bandit eigenvector problem, noisy power methods, and stochastic low-rank linear rewards. Examine information theoretical understanding, regret comparisons for quadratic rewards, and higher-order polynomial bandit problems. Learn about optimal regret for different types of reward functions and potential extensions to reinforcement learning in simulator settings. Gain insights into cutting-edge research in sampling algorithms and geometries on probability distributions presented at the Simons Institute.

Syllabus

Intro
Bandit Problem
Our focus: beyond linearity and concavity
Problem li the Stochastic Bandit Eigenvector Problem
Some related work
Information theoretical understanding
Beyond cubic dimension dependence
Our methodnoisy power method
Problem i Stochastic Low-rank linear reward
Our algorithm: noisy subspace iteration
Regret comparisons: quadratic reward
Higher-order problems
Problem : Symmetric High-order Polynomial bandit
Problem IV: Asymmetric High-order Polynomial bandit
Lower bound: Optimal dependence on a
Overall Regret Comparisons
Extension to RL in simulator setting
Conclusions We find optimal regret for different types of reward function
Future directions

Taught by

Simons Institute

Reviews

Start your review of Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.