AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT HAN Lab via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the groundbreaking research presented in this 19-minute conference talk video from MLSys 2024, featuring the Best Paper "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." Delve into the innovative approach developed by researchers from MIT HAN Lab for compressing and accelerating Large Language Models (LLMs). Learn about the Activation-aware Weight Quantization (AWQ) technique and its potential impact on improving the efficiency of LLMs. Gain insights into the methodology, results, and implications of this cutting-edge work in machine learning systems. Access additional resources, including the project website, full paper, and code repository, to further understand and potentially implement the AWQ technique in your own projects.
Syllabus
MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Taught by
MIT HAN Lab