Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Source Routing for AI Fabrics - Optimizing Network Traffic in Multi-tenant AI Clusters

Open Compute Project via YouTube

Overview

Learn about an innovative approach to scheduling AI workloads in Ethernet fabrics through this technical presentation from Marvell experts Kishore Atreya and Prathyaya Bhandarkar. Explore how source routing frameworks can address challenges in large-scale, multi-tenant AI clusters where high tail latency and jitter impact training performance. Discover a simplified solution that leverages SAI to predetermine flow paths and program them across access nodes, taking advantage of AI training flow predictability. Examine how software controllers can engineer traffic flows between training elements to optimize bandwidth utilization, load, and latency, ultimately reducing network costs and power requirements compared to traditional fabric scheduling approaches. Gain insights into addressing congestion avoidance in AI infrastructure while avoiding the complexity and unpredictable behavior of alternative solutions like enhanced congestion control, load balancing, packet spraying and fabric scheduling.

Syllabus

Source Routing for AI Fabrics

Taught by

Open Compute Project

Reviews

Start your review of Source Routing for AI Fabrics - Optimizing Network Traffic in Multi-tenant AI Clusters

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.