Adaptive Multi-armed Bandit Algorithms for Markovian and IID Rewards

Overview

Explore a technical lecture on multi-armed bandit (MAB) algorithms that addresses both Markovian and independent and identically distributed (i.i.d.) reward scenarios. Delve into the challenges of obtaining regret guarantees for MAB problems where arm rewards form Markov chains outside single parameter exponential families. Learn about a groundbreaking algorithm that employs total variation distance-based testing to identify whether rewards are Markovian or i.i.d., enabling dynamic adaptation between standard and specialized Kullback-Leibler upper confidence bound (KL-UCB) approaches. Delivered by Prof. Arghyadip Roy from IIT Guwahati's Mehta Family School of Data Science and Artificial Intelligence, drawing from his extensive research experience in stochastic systems optimization, wireless network resource allocation, and reinforcement learning gained through his work at institutions including IIT Bombay, University of Illinois at Urbana-Champaign, and Jadavpur University.

Syllabus

Time: 5:00– PM

Taught by

Centre for Networked Intelligence, IISc

Reviews

Start your review of Adaptive Multi-armed Bandit Algorithms for Markovian and IID Rewards

Taught by

Multi-armed Bandit on System-on-Chip - Go Frequentist or Bayesian?

Risk-Sensitive Bandits - Arm Mixtures Optimality and Regret-Efficient Algorithms

Collaborative Decision-Making Under Adversarial and Information Constraints

Advances in Risk-Aware Multi-Armed Bandit Problems - Lecture 3

Almost-Optimal Best Restless Markov Arm Identification with Fixed Confidence

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.