Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SHADE - Enable Fundamental Cacheability for Distributed Deep Learning Training

USENIX via YouTube

Overview

Explore a groundbreaking approach to optimizing distributed deep learning training (DLT) in this conference talk from FAST '23. Dive into SHADE, a novel DLT-aware caching system that addresses the I/O performance bottleneck in accelerator-driven environments. Learn how SHADE leverages importance sampling to detect fine-grained variations at the per-sample level, making informed caching decisions for distributed DLT jobs. Discover the innovative rank-based approach that captures relative importance across different minibatches and dynamically updates importance scores during training. Examine the significant improvements in cache hit ratio and overall training performance achieved by SHADE, particularly in computer vision models. Gain insights into the challenges posed by exponentially growing dataset sizes and the unique I/O workload behaviors of DLT applications, and understand how SHADE's techniques can revolutionize storage system design for deep learning.

Syllabus

FAST '23 - SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

Taught by

USENIX

Reviews

Start your review of SHADE - Enable Fundamental Cacheability for Distributed Deep Learning Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.