Completed
LLM Full fine-tuning with lower VRAM
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Full Fine-tuning LLMs with Lower VRAM: Optimizers, GaLore, and Advanced Techniques
Automatically move to the next video in the Classroom when playback concludes
- 1 LLM Full fine-tuning with lower VRAM
- 2 Video Overview
- 3 Understanding Optimisers
- 4 Stochastic Gradient Descent SGD
- 5 AdamW Optimizer and VRAM requirements
- 6 AdamW 8-bit optimizer
- 7 Adafactor optimiser and memory requirements
- 8 GaLore - reducing gradient and optimizer VRAM
- 9 LoRA versus GaLoRe
- 10 Better and Faster GaLoRe via Subspace Descent
- 11 Layerwise gradient updates
- 12 Training Scripts
- 13 How gradient checkpointing works to reduce memory
- 14 AdamW Performance
- 15 AdamW 8bit Performance
- 16 Adafactor with manual learning rate and schedule
- 17 Adafactor with default/auto learning rate
- 18 Galore AdamW
- 19 Galore AdamW with Subspace descent
- 20 Using AdamW8bit and Adafactor with GaLoRe
- 21 Notebook demo of layerwise gradient updates
- 22 Running with LoRa
- 23 Inferencing and Pushing Models to Hub
- 24 Single GPU Recommendations
- 25 Multi-GPU Recommendations
- 26 Resources