Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LLM Pipelines: Seamless Integration on Embedded Devices - Optimizing Large Language Models for Edge Computing

EDGE AI FOUNDATION via YouTube

Overview

Watch a technical presentation exploring the deployment of Large Language Models (LLMs) on embedded devices through NXP's LLM Pipelines project. Learn about advanced solutions for improving LLM implementation through quantization and fine-tuning techniques, specifically focusing on NXP's high-end MPUs including i.MX 8M Plus, i.MX 93, and i.MX 95. Discover how machine learning quantization techniques can reduce model size and improve execution time while maintaining accuracy, particularly in auto-regressive models. Explore the implementation of Retrieval Augmented Generation (RAG) for specialized use cases like car assistants, including methods for handling hardware constraints and managing out-of-topic queries. Gain insights into comparing different quantization approaches and addressing RAG-related challenges in embedded systems development.

Syllabus

Introduction
LLM Pipelines
Metrics
Fine Tuning
Quantization
Conclusion
Strategic Partners

Taught by

EDGE AI FOUNDATION

Reviews

Start your review of LLM Pipelines: Seamless Integration on Embedded Devices - Optimizing Large Language Models for Edge Computing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.