Completed
PROOF CONSTRUCTION
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Online Learning in Markov Decision Processes - Part 2
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 MARKOV DECISION PROCESSES
- 3 ADVERSARIAL
- 4 PERFORMANCE MEASURE: RE
- 5 OUTLINE
- 6 NON-OBLIVIOUS ADVERSARI
- 7 WHAT WENT WRONG?
- 8 OBLIVIOUS ADVERSARIES
- 9 LEARNING WITH CHANGING TRANSITIONS IS HARD
- 10 PROOF CONSTRUCTION
- 11 SLOWLY CHANGING MDPS
- 12 FORMAL PROTOCOL Online learning in a fixed MDP For each round t = 1,2, ..., • Learner observes state X, EX
- 13 TEMPORAL DEPENDENCES
- 14 REGRET DECOMPOSITION
- 15 THE DRIFT TERMS
- 16 LOCAL-TO-GLOBAL
- 17 THE MDP-EXPERT ALGORITHE
- 18 GUARANTEES FOR MDP-E
- 19 BANDIT FEEDBACK
- 20 ONLINE LINEAR OPTIMIZATIO
- 21 ONLINE MIRROR DESCENT
- 22 THE ONLINE REPS ALGORITH O-REPS
- 23 GUARANTEES FOR O-REPS
- 24 COMPARISON OF GUARANTE
- 25 MDP-E WITH FUNCTION APPROXIMATION MDP-E only needs a good approximation of the action-value
- 26 O-REPS WITH UNCERTAIN MO
- 27 OUTLOOK