LLM Pipelines: Seamless Integration on Embedded Devices - Optimizing Large Language Models for Edge Computing
EDGE AI FOUNDATION via YouTube
Overview
Watch a technical presentation exploring the deployment of Large Language Models (LLMs) on embedded devices through NXP's LLM Pipelines project. Learn about advanced solutions for improving LLM implementation through quantization and fine-tuning techniques, specifically focusing on NXP's high-end MPUs including i.MX 8M Plus, i.MX 93, and i.MX 95. Discover how machine learning quantization techniques can reduce model size and improve execution time while maintaining accuracy, particularly in auto-regressive models. Explore the implementation of Retrieval Augmented Generation (RAG) for specialized use cases like car assistants, including methods for handling hardware constraints and managing out-of-topic queries. Gain insights into comparing different quantization approaches and addressing RAG-related challenges in embedded systems development.
Syllabus
Introduction
LLM Pipelines
Metrics
Fine Tuning
Quantization
Conclusion
Strategic Partners
Taught by
EDGE AI FOUNDATION