Introduction to Reinforcement Learning

Overview

Dive into the world of Reinforcement Learning (RL) with this comprehensive talk by Ben Duffy. Explore the evolution of sequential decision making, from achieving superhuman performance in complex board games to solving 2D Atari and 3D games like Doom, Quake, and StarCraft. Gain insights into the pursuit of creating artificial general intelligence and understand the main breakthroughs, paradigms, formulations, and obstacles within RL. Learn about the agent-environment loop, core concepts such as state, reward, value functions, and policies, and delve into model-based and model-free RL approaches. Discover applications in robotics, language grounding, and multi-agent collaboration. Examine deep reinforcement learning techniques, including value function approximation and policy gradients, and their applications in various domains. Get up to speed with the current state of the field and its future directions in this informative one-hour lecture.

Syllabus

Intro
Part One: Reinforcement Learning (RL)
Applications: Board Games
Applications: 2D Video Games
Applications: Simulated 3D Robotics
Applications: Robotics
Applications: "World Models"
Applications: Language grounding
Applications: Multi-agent collaboration
The Formulation
Agent-Environment Loop in code
Core Concepts: State(s)
Core Concepts: Complex State(s)
Core Concepts: Reward(s)
Core Concepts: Return and Discount → The Return Gt is the total discounted reward from time-stept
Core Concepts: Value Function(s)
Core Concepts: Policies
Core Concepts: Markov Assumption
Core Concepts: Markov Decision Process
Model-based: Dynamic Programming
Model-based Reinforcement Learning
Bellman equation
Policy evaluation example
Generalized Policy Iteration
GridWorlds: Sokoban
The rest of the iceberg
Continuous action/state spaces
Exploration vs Exploitation
Credit Assignment
Sparse, noisy and delayed rewards
Reward hacking
Model-free: Reinforcement Learning
Monte Carlo evaluation
Temporal difference evaluation
Q-learning: Tabular setting
OpenAl gym
DeepMind Lab
Part Two: Deep Reinforcement Learning
Value function approximation
Policy Gradients: Baseline and Advantage
Policy Gradients: Actor-Critic for Starcraft 2
Policy Gradients: PPO for DotA
Policy Gradients: PPO for robotics
Policy Gradients: Sonic Retro Contest
Big picture view of the main algorithms
More RL applications