Quantizing LLMs and Converting to GGUF Format for Faster and Smaller Models

Quantizing LLMs and Converting to GGUF Format for Faster and Smaller Models

Venelin Valkov via YouTube Direct link

- Install llama.cpp

6 of 11

6 of 11

- Install llama.cpp

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Quantizing LLMs and Converting to GGUF Format for Faster and Smaller Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Welcome
  2. 2 - Text tutorial on MLExpert.io
  3. 3 - Fine-tuned model on HuggingFace
  4. 4 - Why quantize your model?
  5. 5 - Google Colab Setup
  6. 6 - Install llama.cpp
  7. 7 - Convert HF model to GGUF
  8. 8 - Run the quantized model with llama-cpp-python
  9. 9 - Evaluate full-precision vs quantized model
  10. 10 - Use your quantized model in Ollama
  11. 11 - Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.