Population-Based Methods for Single- and Multi-Agent Reinforcement Learning - Lecture
USC Information Sciences Institute via YouTube
Overview
Syllabus
Welcome to the Al Seminar Series
Reinforcement Learning (RL)
RL basics
Deep Q-learning (DQN)
Why use target network?
Why reduce estimation variance
Ensemble RL methods
Ensemble RL for variance reduction
MeanQ design choices
Combining with existing techniques
Experiment results (100K interaction steps)
Obviating the target network
Comparing model size and update rate
MeanQ: variance reduction
Loss of ensemble diversity
Linear function approximation
Diversity through independent sampling
Ongoing investigation
Takeaways
Fictitious Play
What to do in large dynamical environments
PSRO convergence properties
Extensive-Form Double Oracle (XDO)
XDO: results
XDO convergence properties
Taught by
USC Information Sciences Institute