Overview
Syllabus
Intro
Part One: Reinforcement Learning (RL)
Applications: Board Games
Applications: 2D Video Games
Applications: Simulated 3D Robotics
Applications: Robotics
Applications: "World Models"
Applications: Language grounding
Applications: Multi-agent collaboration
The Formulation
Agent-Environment Loop in code
Core Concepts: State(s)
Core Concepts: Complex State(s)
Core Concepts: Reward(s)
Core Concepts: Return and Discount → The Return Gt is the total discounted reward from time-stept
Core Concepts: Value Function(s)
Core Concepts: Policies
Core Concepts: Markov Assumption
Core Concepts: Markov Decision Process
Model-based: Dynamic Programming
Model-based Reinforcement Learning
Bellman equation
Policy evaluation example
Generalized Policy Iteration
GridWorlds: Sokoban
The rest of the iceberg
Continuous action/state spaces
Exploration vs Exploitation
Credit Assignment
Sparse, noisy and delayed rewards
Reward hacking
Model-free: Reinforcement Learning
Monte Carlo evaluation
Temporal difference evaluation
Q-learning: Tabular setting
OpenAl gym
DeepMind Lab
Part Two: Deep Reinforcement Learning
Value function approximation
Policy Gradients: Baseline and Advantage
Policy Gradients: Actor-Critic for Starcraft 2
Policy Gradients: PPO for DotA
Policy Gradients: PPO for robotics
Policy Gradients: Sonic Retro Contest
Big picture view of the main algorithms
More RL applications
Taught by
Open Data Science