Language Model Alignment: Theory and Algorithms

Overview

Explore the intricacies of language model alignment in this comprehensive lecture by Ahmad Beirami from Google. Delve into the post-training process aimed at generating samples from an aligned distribution to enhance rewards such as safety and factuality while minimizing divergence from the base model. Examine the best-of-N baseline technique and more advanced methods solving KL-regularized reinforcement learning problems. Gain insights into key results through simplified examples and discover a novel modular alignment approach called controlled decoding. Learn how this technique solves the KL-regularized RL problem while maintaining a frozen base model through prefix scorer learning, offering inference-time configurability. Analyze the surprising effectiveness of best-of-N in achieving competitive or superior reward-KL tradeoffs compared to state-of-the-art alignment baselines.