Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Platform Performance Optimization for AI - A Resource Management Perspective

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore a comprehensive conference talk that delves into platform performance optimization for AI workloads from a resource management perspective. Learn how node resource management impacts AI workload performance through a systematic approach that covers goal setting, data collection, analysis, and visualization techniques. Discover practical insights for LLM inference optimization, including methods to instrument PyTorch without source code modifications, techniques for flexible measurement adjustments without costly benchmark reruns, and advanced visualization approaches that provide deeper insights than traditional numeric metrics. Follow along as speakers demonstrate a real-world case study where resource management strategies led to a 3.5x improvement in token throughput per worker node compared to baseline performance. Master the trade-offs between total throughput and latency while gaining actionable techniques for optimizing AI platform performance.

Syllabus

Platform Performance Optimization for AI - a Resource Management P... Antti Kervinen & Dixita Narang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Platform Performance Optimization for AI - A Resource Management Perspective

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.