Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning
GERAD Research Center via YouTube
Overview
Syllabus
Intro
Motivations
Policy-Space Response Oracles (PSRO) [Lanctot et. al '17] • Maintains a pool of strategies for each player, and iteratively.
Motivated Example: "Deal-or-No-Deal"[1]
Example: Bach or Stravinsky
PSRO on games beyond purely adversarial domains (no search)
Extending AlphaZero to Large Imperfect Information
MCTS in PSRO: A Bayesian Interpretation
Taught by
GERAD Research Center