Formal Languages and Automata for Reward Function Specification and Efficient Reinforcement Learning

Overview

Explore formal languages and automata for reward function specification and efficient reinforcement learning in this comprehensive lecture by Sheila McIlraith from the University of Toronto. Delve into the challenges of real-world reinforcement learning, focusing on goals and preferences expression. Examine Linear Temporal Logic (LTL) as a compelling method for expressing temporal properties of traces. Discover the concept of reward machines and their application in defining reward functions. Compare various reinforcement learning methods, including Q-Learning, Option-Based Hierarchical RL, and Q-Learning for Reward Machines (QRM). Analyze experimental results from discrete domains, Office World, Minecraft World, and Water World. Investigate techniques for creating reward machines, including construction from formal languages and generation using symbolic planners. Gain insights into reward specification and its application in partially-observable reinforcement learning environments.

Syllabus

Intro
Acknowledgements
Reinforcement Learning (RL)
Challenges of Real-World RL
Goals and Preferences
Linear Temporal Logic (LTL) A compelling logic to express temporal properties of traces.
Challenges to RL
Toy Problem Disclaimer
Running Example
Decoupling Transition and Reward Functions
The Rest of the Talk
Define a Reward Function using a Reward Machine
Reward Function Vocabulary
Simple Reward Machine
Reward Machines in Action
Other Reward Machines
Q-Learning Baseline
Option-Based Hierarchical RL (HRL)
HRL with RM-Based Pruning (HRL-RM)
HRL Methods Can Find Suboptimal Policies
Q-Learning for Reward Machines (QRM)
QRM In Action
Recall: Methods for Exploiting RM Structure
5. QRM + Reward Shaping (QRM + RS)
Test Domains
Test in Discrete Domains
Office World Experiments
Minecraft World Experiments
Function Approximation with QRM
Water World Experiments
Creating Reward Machines
Reward Specification: one size does not fit all
1. Construct Reward Machine from Formal Languages
Generate RM using a Symbolic Planner
Learn RMs for Partially-Observable RL