Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Trust Region Policy Optimization

Pascal Poupart via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the Trust Region Policy Optimization (TRPO) algorithm in this 23-minute lecture presented by Shivam Kalra. Delve into reinforcement learning concepts, addressing policy gradient challenges and optimization techniques. Learn about the KL-penalized problem, the Minorization Maximization (MM) algorithm, and the Conjugate Gradient (CG) method. Gain insights into the TRPO algorithm, including its KL-constrained approach and implementation details. Enhance your understanding of advanced reinforcement learning techniques and their applications in solving complex optimization problems.

Syllabus

Intro
Reinforcement Learning
Problems of Policy Gradient
RL to Optimization
What loss to optimize?
New State Visitation is Difficult
Minorization Maximization (MM) algorithm
Solving KL-Penalized Problem
Conjugate Gradient (CG)
TRPO: KL-Constrained
TRPO Algorithm

Taught by

Pascal Poupart

Reviews

Start your review of Trust Region Policy Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.