Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Optimizing ML Model Inference for Production AI - Strategies for Latency, Throughput, and Cost

AWS Events via YouTube

Overview

Discover optimization strategies for ML model inference across the entire technology stack in this AWS re:Invent lightning talk. Dive into Baseten's comprehensive approach to enhancing latency, throughput, and cost efficiency for AI-native products in production environments. Gain insights into both applied model performance research and distributed GPU infrastructure, understanding how these disciplines intersect to support mission-critical inference workloads for businesses of all sizes. Learn from Baseten, an AWS Partner, about practical techniques and methodologies that drive better performance and cost-effectiveness in AI deployments.

Syllabus

AWS re:Invent 2024 - Faster, cheaper, better: Optimizing inference for production AI (AIM248)

Taught by

AWS Events

Reviews

Start your review of Optimizing ML Model Inference for Production AI - Strategies for Latency, Throughput, and Cost

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.