Optimizing ML Model Inference for Production AI - Strategies for Latency, Throughput, and Cost
AWS Events via YouTube
Overview
Discover optimization strategies for ML model inference across the entire technology stack in this AWS re:Invent lightning talk. Dive into Baseten's comprehensive approach to enhancing latency, throughput, and cost efficiency for AI-native products in production environments. Gain insights into both applied model performance research and distributed GPU infrastructure, understanding how these disciplines intersect to support mission-critical inference workloads for businesses of all sizes. Learn from Baseten, an AWS Partner, about practical techniques and methodologies that drive better performance and cost-effectiveness in AI deployments.
Syllabus
AWS re:Invent 2024 - Faster, cheaper, better: Optimizing inference for production AI (AIM248)
Taught by
AWS Events