MobileLLM - Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Overview

Watch a 20-minute research presentation exploring the development of MobileLLM, a groundbreaking approach to deploying efficient large language models on mobile devices. Learn how deep and thin architectures, embedding sharing, and grouped-query attention mechanisms enable high-performance language models with fewer than a billion parameters. Discover how these optimizations achieve significant accuracy improvements over previous state-of-the-art models in commonsense reasoning tasks, with 2.7% and 4.3% boosts for 125M and 350M parameter models respectively. Understand how this architectural innovation challenges the conventional wisdom that data and parameter quantity are the primary drivers of model quality, while demonstrating comparable performance to much larger models in practical applications like API calling tasks.

Syllabus

GenAI on the Edge Forum: MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device..

Taught by

EDGE AI FOUNDATION

Reviews

Start your review of MobileLLM - Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Taught by

Optimizing Large Language Model Inference for Arm CPUs

On-Device Generative AI: Deployment, Optimization, and Advanced Techniques

Visual Language Models for Edge AI 2.0 - Multi-image Reasoning and In-context Learning

Functional Tokens for On-device Multimodal Models - Nexa AI

Memory Optimization Techniques for On-Device Large Language Models

The Disruptive Potential of On-Device LLMs

Never Stop Learning.