Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Watch a technical talk exploring optimization strategies for running Large Language Models (LLMs) on Arm CPUs, presented by Principal Engineer Dibakar GOPE from Arm's Machine Learning & AI division. Learn about advanced techniques for accelerating LLM inference on commodity Arm processors, focusing on matrix multiplication optimizations with low numerical precision and compression methods to minimize memory traffic. Discover how to leverage SDOT and SMMLA instructions in combination with 4-bit quantization schemas to enable efficient LLM deployment across smartphones and edge devices, making advanced AI capabilities more accessible to billions of compact computing devices.
Syllabus
GenAI on the Edge Forum: Optimizing Large Language Model (LLM) Inference for Arm CPUs
Taught by
EDGE AI FOUNDATION