Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a technical deep dive video examining OpenAI's development of an optimized Reinforcement Learning from Human Feedback (RLHF) algorithm combined with Force Sampling Beam Search (FSBS) for improving Large Language Model performance. Learn about the motivations behind this innovative technique and gain insights into current LLM optimization methodologies. Understand how CriticGPT leverages these approaches to enhance model quality and catch potential errors, drawing from OpenAI's research papers on using GPT-4 to identify its own mistakes. Delve into the technical aspects of AI agent development and research while examining real-world applications of these advanced optimization strategies.
Syllabus
NEW CriticGPT by OpenAI: RLHF + FSBS
Taught by
Discover AI