GPT-Fast - Blazingly Fast Inference with PyTorch

GPT-Fast - Blazingly Fast Inference with PyTorch

Aleksa Gordić - The AI Epiphany via YouTube Direct link

45:25 - Bonus optimizations

9 of 10

9 of 10

45:25 - Bonus optimizations

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

GPT-Fast - Blazingly Fast Inference with PyTorch

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 - Intro
  2. 2 00:45 - HyperStack GPUs! sponsored
  3. 3 02:23 - What is GPT-Fast?
  4. 4 08:40 - PyTorch compile
  5. 5 28:15 - int8 quantization
  6. 6 32:15 - Speculative Decoding
  7. 7 40:12 - Int 4 quantization
  8. 8 42:05 - Putting it all together, tensor parallelism
  9. 9 45:25 - Bonus optimizations
  10. 10 58:10 - Outro, questions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.