Completed
Teacher model performance base SmolLM 135M
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Distillation of Transformer Models - Tutorial and Code Walk-through
Automatically move to the next video in the Classroom when playback concludes
- 1 AI model distillation Whisper, Flux, Minitron, gpt-4o-mini?
- 2 Video Overview - Distillation Tutorial and Code Walk-through
- 3 Distillation Examples Diffusion - Flux Schnell / Dev, Transcription - Distil-Whisper, LLMs - Nvidia Minitron
- 4 How distillation works
- 5 Student model initialization
- 6 Layer / depth pruning
- 7 Width pruning
- 8 Pre-training versus distillation
- 9 Cross-entropy loss vs KL-divergence
- 10 Instruction fine-tuning
- 11 Distilling SmolLM 135M to a 99M model
- 12 Code walk-through setup.
- 13 Pruning Notebook
- 14 Layer Pruning
- 15 Width Pruning
- 16 Why pruning works?
- 17 Distillation Script - Multi-GPU Setup
- 18 Distillation Script Walk-through
- 19 Distillation Configuration File Walk-through
- 20 Distillation Startup and Performance Monitoring with tensorboard
- 21 Instruction fine-tuning and dataset selection
- 22 Instruction FT Startup and Performance Monitoring with tensorboard
- 23 Running inference to evaluate distillation performance
- 24 Teacher model performance base SmolLM 135M
- 25 SmolLM Instruct model performance
- 26 Raw pruned model performance layer pruned 99M
- 27 Width + Layer pruning performance raw 99M
- 28 Distilled model performance before instruction tuning 99M
- 29 Instruction tuning performance evaluation
- 30 SmolLM 135M Instruct performance
- 31 Instruction tuned distilled model performance 99M model
- 32 Final Tips best pruning approach, learning rate, batch size and model size effects
- 33 Video Resources