Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Prover-Verifier Games for Improving LLM Output Legibility

Simons Institute via YouTube

Overview

Watch a 41-minute research talk from OpenAI's Yining Chen at the Simons Institute exploring how prover-verifier games can enhance the legibility and verifiability of Large Language Model outputs. Discover an innovative training algorithm inspired by prover-verifier games that aims to improve the clarity and checkability of LLM solutions, particularly in grade-school math problems. Learn how the algorithm trains small verifiers to predict solution correctness while simultaneously developing "helpful" provers for accurate solutions and "sneaky" provers to test system robustness. Examine how this training approach transfers to human verification tasks, with results showing increased human accuracy in checking legitimate solutions and better detection of deceptive ones. Understand the broader implications for AI alignment and the potential for using legibility training with small verifiers to enhance the interpretability of large language models for human users.

Syllabus

Prover-Verifier Games Improve Legibility of LLM outputs

Taught by

Simons Institute

Reviews

Start your review of Prover-Verifier Games for Improving LLM Output Legibility

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.