Overview
Dive into the second part of a three-part lecture series on game theoretic learning and spectrum management presented by Amir Leshem and Kobi Cohen for the IEEE Signal Processing Society. Explore key concepts such as single-player multi-arm bandit problems, stochastic map formulation, and sublinear regret. Examine various algorithms including UCB1, epsilon-greedy, and adaptive sequential algorithms. Investigate Markovian rewards, restless MAPs, and regret minimization techniques. Learn about exploration and exploitation network structures, reinforcement learning, and deep reinforcement learning applications. Gain insights into single-agent learning and exploration phases through simulations and practical examples in this comprehensive one-hour lecture.
Syllabus
Game Theoretic Learning
Single Player Multiarm Bandit
Motivation to Multiarm
stochastic map formulation
sublinear regret
ucb1
epsilon and greedy
markovian reward
restless map
goal notation
regret minimization
Adaptive Sequential Algorithms
Exploration Network Structure
Exploitation Networks
Simulations
Reinforcement Learning
Deep Reinforcement Learning
The Problem
The Algorithm
Single Agent Learning
Exploration Phase A
Taught by
IEEE Signal Processing Society