Completed
- Run the quantized model with llama-cpp-python
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Quantizing LLMs and Converting to GGUF Format for Faster and Smaller Models
Automatically move to the next video in the Classroom when playback concludes
- 1 - Welcome
- 2 - Text tutorial on MLExpert.io
- 3 - Fine-tuned model on HuggingFace
- 4 - Why quantize your model?
- 5 - Google Colab Setup
- 6 - Install llama.cpp
- 7 - Convert HF model to GGUF
- 8 - Run the quantized model with llama-cpp-python
- 9 - Evaluate full-precision vs quantized model
- 10 - Use your quantized model in Ollama
- 11 - Conclusion