Distillation of Transformer Models - Tutorial and Code Walk-through

Distillation of Transformer Models - Tutorial and Code Walk-through

Trelis Research via YouTube Direct link

Student model initialization

5 of 33

5 of 33

Student model initialization

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Distillation of Transformer Models - Tutorial and Code Walk-through

Automatically move to the next video in the Classroom when playback concludes

  1. 1 AI model distillation Whisper, Flux, Minitron, gpt-4o-mini?
  2. 2 Video Overview - Distillation Tutorial and Code Walk-through
  3. 3 Distillation Examples Diffusion - Flux Schnell / Dev, Transcription - Distil-Whisper, LLMs - Nvidia Minitron
  4. 4 How distillation works
  5. 5 Student model initialization
  6. 6 Layer / depth pruning
  7. 7 Width pruning
  8. 8 Pre-training versus distillation
  9. 9 Cross-entropy loss vs KL-divergence
  10. 10 Instruction fine-tuning
  11. 11 Distilling SmolLM 135M to a 99M model
  12. 12 Code walk-through setup.
  13. 13 Pruning Notebook
  14. 14 Layer Pruning
  15. 15 Width Pruning
  16. 16 Why pruning works?
  17. 17 Distillation Script - Multi-GPU Setup
  18. 18 Distillation Script Walk-through
  19. 19 Distillation Configuration File Walk-through
  20. 20 Distillation Startup and Performance Monitoring with tensorboard
  21. 21 Instruction fine-tuning and dataset selection
  22. 22 Instruction FT Startup and Performance Monitoring with tensorboard
  23. 23 Running inference to evaluate distillation performance
  24. 24 Teacher model performance base SmolLM 135M
  25. 25 SmolLM Instruct model performance
  26. 26 Raw pruned model performance layer pruned 99M
  27. 27 Width + Layer pruning performance raw 99M
  28. 28 Distilled model performance before instruction tuning 99M
  29. 29 Instruction tuning performance evaluation
  30. 30 SmolLM 135M Instruct performance
  31. 31 Instruction tuned distilled model performance 99M model
  32. 32 Final Tips best pruning approach, learning rate, batch size and model size effects
  33. 33 Video Resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.