Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to quantize Large Language Models (LLMs) using LLAMA.cpp in this 12-minute tutorial video that demonstrates running these models efficiently on laptops and small devices without requiring GPUs. Follow along with a practical demonstration of quantizing a fine-tuned Gemma 2 Billion parameter model on a Macbook, with steps applicable to any fine-tuned LLM. Master the installation process of LLAMA.cpp, an open-source C/C++ library, understand the preliminaries of model quantization, and discover how to push LLMs to the HuggingFace Hub. Gain insights from an experienced Machine Learning researcher with 15 years of software engineering background who guides you through the complete process from introduction to conclusion.
Syllabus
Introduction
Push LLM to HuggingFace Hub
LLAMAcpp
LLAMAcpp installation
Preliminaries
Quantization
Conclusion
Taught by
AI Bites