Challenges in Reward Design for Reinforcement Learning-based Traffic Signal Control - An Investigation Using CO2 Emission Objective
Eclipse Foundation via YouTube
Overview
Syllabus
Intro
The Importance of Aligning Powerful AI Systems
Reinforcement Learning Example: Cliff Walking
Aligning TSC Agents with Rewards
Objective: Minimizing CO2 Emission at a Signalized Intersection
Reinforcement Learning Setup
Training the Neural Network - Deep Q-Network (DQN)
Motivation - Uninformative Emission Penalty
Informativeness and Expressiveness for Alignment
Findings Comparing Rewards
Findings - Rewards are sensitive to parameterization
Conclusion - Informativeness and Expressiveness are necessary
Technologies that helped a LOT
Taught by
Eclipse Foundation