Completed
Policy Gradients: Baseline and Advantage
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Introduction to Reinforcement Learning
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Part One: Reinforcement Learning (RL)
- 3 Applications: Board Games
- 4 Applications: 2D Video Games
- 5 Applications: Simulated 3D Robotics
- 6 Applications: Robotics
- 7 Applications: "World Models"
- 8 Applications: Language grounding
- 9 Applications: Multi-agent collaboration
- 10 The Formulation
- 11 Agent-Environment Loop in code
- 12 Core Concepts: State(s)
- 13 Core Concepts: Complex State(s)
- 14 Core Concepts: Reward(s)
- 15 Core Concepts: Return and Discount → The Return Gt is the total discounted reward from time-stept
- 16 Core Concepts: Value Function(s)
- 17 Core Concepts: Policies
- 18 Core Concepts: Markov Assumption
- 19 Core Concepts: Markov Decision Process
- 20 Model-based: Dynamic Programming
- 21 Model-based Reinforcement Learning
- 22 Bellman equation
- 23 Policy evaluation example
- 24 Generalized Policy Iteration
- 25 GridWorlds: Sokoban
- 26 The rest of the iceberg
- 27 Continuous action/state spaces
- 28 Exploration vs Exploitation
- 29 Credit Assignment
- 30 Sparse, noisy and delayed rewards
- 31 Reward hacking
- 32 Model-free: Reinforcement Learning
- 33 Monte Carlo evaluation
- 34 Temporal difference evaluation
- 35 Q-learning: Tabular setting
- 36 OpenAl gym
- 37 DeepMind Lab
- 38 Part Two: Deep Reinforcement Learning
- 39 Value function approximation
- 40 Policy Gradients: Baseline and Advantage
- 41 Policy Gradients: Actor-Critic for Starcraft 2
- 42 Policy Gradients: PPO for DotA
- 43 Policy Gradients: PPO for robotics
- 44 Policy Gradients: Sonic Retro Contest
- 45 Big picture view of the main algorithms
- 46 More RL applications