Maximum Entropy Reinforcement Learning

Overview

Explore maximum entropy reinforcement learning in this 42-minute lecture from Pascal Poupart's CS885 course at the University of Waterloo. Delve into key concepts such as encouraging stochasticity, optimal policy, Q-function, and greedy policy. Learn about soft Q-value iteration, soft Q-learning, and soft policy iteration, including policy improvement and proof derivations. Examine the Soft Actor-Critic (SAC) algorithm and its empirical results, with a focus on robustness to environment changes. Access accompanying slides on the course website for a comprehensive understanding of this advanced reinforcement learning topic.

Syllabus

Intro
Maximum Entropy RL
Reinforcement Learning
Encouraging Stochasticity
Optimal Policy
Q-function
Greedy Policy
Greedy Value function
Soft Q-Value Iteration
Soft Q-learning
Soft Policy Iteration
Policy improvement
Inequality derivation
Proof derivation
Soft Actor-Critic
Soft Actor Critic (SAC)
Empirical Results
Robustness to Environment Changes