How to Quantize a Large Language Model with GGUF or AWQ

Overview

Learn how to quantize large language models using GGUF or AWQ in this 26-minute video tutorial. Explore the reasons for quantization, understand different quantization methods, and compare GGUF, BNB, AWQ, and GPTQ techniques. Follow step-by-step instructions for quantizing models with AWQ and GGUF (GGML), and gain access to advanced fine-tuning resources, including scripts for unsupervised and supervised fine-tuning, dataset preparation, and embedding creation. Discover valuable resources such as presentation slides, GitHub repositories, and related research papers to enhance your understanding of LLM quantization techniques.

Syllabus

How to quantize a large language model
: Why quantize a language model
What is quantization
Which quantization to use?
GGUF vs BNB vs AWQ vs GPTQ
How to quantize with AWQ
How to quantize with GGUF GGML
Recap