Full Fine-tuning LLMs with Lower VRAM: Optimizers, GaLore, and Advanced Techniques

Full Fine-tuning LLMs with Lower VRAM: Optimizers, GaLore, and Advanced Techniques

Trelis Research via YouTube Direct link

Adafactor with manual learning rate and schedule

16 of 26

16 of 26

Adafactor with manual learning rate and schedule

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Full Fine-tuning LLMs with Lower VRAM: Optimizers, GaLore, and Advanced Techniques

Automatically move to the next video in the Classroom when playback concludes

  1. 1 LLM Full fine-tuning with lower VRAM
  2. 2 Video Overview
  3. 3 Understanding Optimisers
  4. 4 Stochastic Gradient Descent SGD
  5. 5 AdamW Optimizer and VRAM requirements
  6. 6 AdamW 8-bit optimizer
  7. 7 Adafactor optimiser and memory requirements
  8. 8 GaLore - reducing gradient and optimizer VRAM
  9. 9 LoRA versus GaLoRe
  10. 10 Better and Faster GaLoRe via Subspace Descent
  11. 11 Layerwise gradient updates
  12. 12 Training Scripts
  13. 13 How gradient checkpointing works to reduce memory
  14. 14 AdamW Performance
  15. 15 AdamW 8bit Performance
  16. 16 Adafactor with manual learning rate and schedule
  17. 17 Adafactor with default/auto learning rate
  18. 18 Galore AdamW
  19. 19 Galore AdamW with Subspace descent
  20. 20 Using AdamW8bit and Adafactor with GaLoRe
  21. 21 Notebook demo of layerwise gradient updates
  22. 22 Running with LoRa
  23. 23 Inferencing and Pushing Models to Hub
  24. 24 Single GPU Recommendations
  25. 25 Multi-GPU Recommendations
  26. 26 Resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.