Overview
Explore the concept of typical decoding for natural language generation in this 49-minute video lecture. Learn about the challenges of generating human-like text from language models and discover a new decoding method called typical sampling. Understand the trade-off between high-probability and high-information samples, and how this approach connects to psycholinguistic theories of human speech generation. Examine the limitations of current sampling methods like top-k and nucleus sampling, and see how typical sampling offers a more principled and effective alternative. Follow along as the video breaks down the paper's key ideas, experimental results, and potential implications for improving text generation from AI language models.
Syllabus
- Intro
- Sponsor: Fully Connected by Weights & Biases
- Paper Overview
- What's the problem with sampling?
- Beam Search: The good and the bad
- Top-k and Nucleus Sampling
- Why the most likely things might not be the best
- The expected information content of the next word
- How to trade off information and likelihood
- Connections to information theory and psycholinguistics
- Introducing Typical Sampling
- Experimental Evaluation
- My thoughts on this paper
Taught by
Yannic Kilcher