Overview
Syllabus
Intro
MARKOV DECISION PROCESSES
ADVERSARIAL
PERFORMANCE MEASURE: RE
OUTLINE
NON-OBLIVIOUS ADVERSARI
WHAT WENT WRONG?
OBLIVIOUS ADVERSARIES
LEARNING WITH CHANGING TRANSITIONS IS HARD
PROOF CONSTRUCTION
SLOWLY CHANGING MDPS
FORMAL PROTOCOL Online learning in a fixed MDP For each round t = 1,2, ..., • Learner observes state X, EX
TEMPORAL DEPENDENCES
REGRET DECOMPOSITION
THE DRIFT TERMS
LOCAL-TO-GLOBAL
THE MDP-EXPERT ALGORITHE
GUARANTEES FOR MDP-E
BANDIT FEEDBACK
ONLINE LINEAR OPTIMIZATIO
ONLINE MIRROR DESCENT
THE ONLINE REPS ALGORITH O-REPS
GUARANTEES FOR O-REPS
COMPARISON OF GUARANTE
MDP-E WITH FUNCTION APPROXIMATION MDP-E only needs a good approximation of the action-value
O-REPS WITH UNCERTAIN MO
OUTLOOK
Taught by
Simons Institute