Aligning Language Models with LESS Data and Simple Preference Optimization (SimPO)

Overview

Watch a one-hour research seminar from MIT where Princeton PhD candidate Mengzhou Xia presents two innovative algorithms for improving language model alignment. Learn about LESS, a model- and optimizer-aware data selection algorithm that achieves better results using just 5% of carefully selected training data, and SimPO, a reference-free reward formulation that outperforms existing offline preference optimization methods. Discover how these approaches enhance supervised fine-tuning and preference optimization in language models, with practical demonstrations including the Gemma2-9B model's superior performance among models under 10B parameters. Gain insights from Xia's research on developing effective language models through data-centric approaches and objective designs within academic constraints, drawing from her experience as an Apple Scholars in AI/ML PhD Fellow and Bloomberg Data Science PhD Fellow.