Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Telemetry-Based Load Balancing of AI/ML Workloads in Self-Healing Networks

Open Compute Project via YouTube

Overview

Learn how Tencent implemented a self-healing network for AI/ML workloads in this 19-minute technical presentation from Broadcom experts. Explore the unique challenges of AI/ML network traffic, which differs from traditional workloads by having fewer flows that consume significant bandwidth and quickly saturate links while requiring lossless fabric and low latency. Discover how Ethernet-based technologies and the SAI/SONiC ecosystem are being utilized alongside Broadcom's innovative networking solutions to maintain optimal performance. Gain insights into the implementation of In-band telemetry and packet drop monitoring capabilities, and understand how applications leverage granular network telemetry data to dynamically optimize load balancing for AI/ML workload flows.

Syllabus

Telemetry based load balancing of AI/ML workloads

Taught by

Open Compute Project

Reviews

Start your review of Telemetry-Based Load Balancing of AI/ML Workloads in Self-Healing Networks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.