Adaptive Multi-armed Bandit Algorithms for Markovian and IID Rewards
Centre for Networked Intelligence, IISc via YouTube
Overview
Explore a technical lecture on multi-armed bandit (MAB) algorithms that addresses both Markovian and independent and identically distributed (i.i.d.) reward scenarios. Delve into the challenges of obtaining regret guarantees for MAB problems where arm rewards form Markov chains outside single parameter exponential families. Learn about a groundbreaking algorithm that employs total variation distance-based testing to identify whether rewards are Markovian or i.i.d., enabling dynamic adaptation between standard and specialized Kullback-Leibler upper confidence bound (KL-UCB) approaches. Delivered by Prof. Arghyadip Roy from IIT Guwahati's Mehta Family School of Data Science and Artificial Intelligence, drawing from his extensive research experience in stochastic systems optimization, wireless network resource allocation, and reinforcement learning gained through his work at institutions including IIT Bombay, University of Illinois at Urbana-Champaign, and Jadavpur University.
Syllabus
Time: 5:00– PM
Taught by
Centre for Networked Intelligence, IISc