Prover-Verifier Games for Improving LLM Output Legibility

Overview

Watch a 41-minute research talk from OpenAI's Yining Chen at the Simons Institute exploring how prover-verifier games can enhance the legibility and verifiability of Large Language Model outputs. Discover an innovative training algorithm inspired by prover-verifier games that aims to improve the clarity and checkability of LLM solutions, particularly in grade-school math problems. Learn how the algorithm trains small verifiers to predict solution correctness while simultaneously developing "helpful" provers for accurate solutions and "sneaky" provers to test system robustness. Examine how this training approach transfers to human verification tasks, with results showing increased human accuracy in checking legitimate solutions and better detection of deceptive ones. Understand the broader implications for AI alignment and the potential for using legibility training with small verifiers to enhance the interpretability of large language models for human users.

Syllabus

Prover-Verifier Games Improve Legibility of LLM outputs

Taught by

Simons Institute

Reviews

Start your review of Prover-Verifier Games for Improving LLM Output Legibility

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.