Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

USENIX via YouTube

Overview

Explore a groundbreaking conference talk on CASSINI, a network-aware job scheduler for machine learning clusters. Delve into the innovative geometric abstraction introduced to consider communication patterns of different jobs during network link placement. Learn about the Affinity graph technique that finds time-shift values to interleave communication phases of jobs sharing the same network link. Discover how CASSINI improves average and tail completion times of jobs by up to 1.6x and 2.5x respectively, compared to state-of-the-art ML schedulers. Examine experimental results from 13 common ML models on a 24-server testbed, showcasing CASSINI's ability to reduce ECN marked packets in the cluster by up to 33x. Gain insights into advanced network-aware scheduling techniques for optimizing machine learning cluster performance.

Syllabus

NSDI '24 - CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

Taught by

USENIX

Reviews

Start your review of CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.