The Pitfalls of Next-token Prediction in Language Models

Overview

Explore a thought-provoking lecture that delves into the limitations of next-token prediction in modeling human intelligence. Examine the critical distinction between autoregressive inference and teacher-forced training in language models. Discover why the popular criticism of error compounding during autoregressive inference may overlook a more fundamental issue: the potential failure of teacher-forcing to learn accurate next-token predictors for certain task classes. Investigate a general mechanism of teacher-forcing failure and analyze empirical evidence from a minimal planning task where both Transformer and Mamba architectures struggle. Consider the potential benefits of training models to predict multiple tokens in advance as a possible solution. Gain insights that can inform future debates and inspire research beyond the current next-token prediction paradigm in artificial intelligence.

Syllabus

The Pitfalls of Next-token Prediction

Taught by

Simons Institute

Reviews

Start your review of The Pitfalls of Next-token Prediction in Language Models

Taught by

The Right Token for Next-Token Visual Prediction

Geometric Representations of Far Future Predictions in Deep Neural Networks - Next-Token Training

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Training BERT - Train With Next Sentence Prediction

Inside GPT - Large Language Models Demystified

From Word Prediction to Complex Skills: Compositional Thinking and Metacognition in Large Language Models - Keynote

100+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

AI for Everyone: 10 Best Free Artificial Intelligence Courses for 2024

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.