Completed
00:45 - HyperStack GPUs! sponsored
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
GPT-Fast - Blazingly Fast Inference with PyTorch
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 - Intro
- 2 00:45 - HyperStack GPUs! sponsored
- 3 02:23 - What is GPT-Fast?
- 4 08:40 - PyTorch compile
- 5 28:15 - int8 quantization
- 6 32:15 - Speculative Decoding
- 7 40:12 - Int 4 quantization
- 8 42:05 - Putting it all together, tensor parallelism
- 9 45:25 - Bonus optimizations
- 10 58:10 - Outro, questions