Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

CriticGPT: Understanding RLHF and Force Sampling Beam Search Optimization

Discover AI via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a technical deep dive video examining OpenAI's development of an optimized Reinforcement Learning from Human Feedback (RLHF) algorithm combined with Force Sampling Beam Search (FSBS) for improving Large Language Model performance. Learn about the motivations behind this innovative technique and gain insights into current LLM optimization methodologies. Understand how CriticGPT leverages these approaches to enhance model quality and catch potential errors, drawing from OpenAI's research papers on using GPT-4 to identify its own mistakes. Delve into the technical aspects of AI agent development and research while examining real-world applications of these advanced optimization strategies.

Syllabus

NEW CriticGPT by OpenAI: RLHF + FSBS

Taught by

Discover AI

Reviews

Start your review of CriticGPT: Understanding RLHF and Force Sampling Beam Search Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.