Overview
Syllabus
Intro
Acknowledgements
Reinforcement Learning (RL)
Challenges of Real-World RL
Goals and Preferences
Linear Temporal Logic (LTL) A compelling logic to express temporal properties of traces.
Challenges to RL
Toy Problem Disclaimer
Running Example
Decoupling Transition and Reward Functions
The Rest of the Talk
Define a Reward Function using a Reward Machine
Reward Function Vocabulary
Simple Reward Machine
Reward Machines in Action
Other Reward Machines
Q-Learning Baseline
Option-Based Hierarchical RL (HRL)
HRL with RM-Based Pruning (HRL-RM)
HRL Methods Can Find Suboptimal Policies
Q-Learning for Reward Machines (QRM)
QRM In Action
Recall: Methods for Exploiting RM Structure
5. QRM + Reward Shaping (QRM + RS)
Test Domains
Test in Discrete Domains
Office World Experiments
Minecraft World Experiments
Function Approximation with QRM
Water World Experiments
Creating Reward Machines
Reward Specification: one size does not fit all
1. Construct Reward Machine from Formal Languages
Generate RM using a Symbolic Planner
Learn RMs for Partially-Observable RL
Taught by
Simons Institute