Completed
Introducing Daniel Varoli from Zapata.ai
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Understanding Medusa: A Framework for LLM Inference Acceleration with Multiple Decoding Heads
Automatically move to the next video in the Classroom when playback concludes
- 1 Introducing Daniel Varoli from Zapata.ai
- 2 The Problem with LLMs Today
- 3 How we Can Solve These Problems
- 4 Normal vs. Speculative Architecture
- 5 Speculative Decoding Example
- 6 Introducing Medusa
- 7 Medusa’s Decoding Heads
- 8 Generating Tokens With Medusa Heads
- 9 Verifying Candidates With Medusa
- 10 What if we Mess Up?
- 11 Rejecting Sampling For Accepting Candidates
- 12 Considering Many Completion Candidates at Once
- 13 Tree Attention Diagrams
- 14 How to Integrate Medusa Into a LLM
- 15 Results