Accelerating Neural Recommendation Training with Embedding Scheduling

Overview

Explore a groundbreaking approach to accelerating distributed Deep Learning Recommendation Model (DLRM) training without compromising model accuracy. Delve into the concept of embedding scheduling, which proactively determines optimal embedding training locations and synchronization strategies. Learn about Herald, a real-time embedding scheduler designed to increase cache hit rates and decrease unnecessary updates, significantly reducing communication overhead. Discover how this innovative method leverages the predictability and infrequency of in-cache embedding accesses in distributed training systems. Examine the performance improvements achieved through adaptive location-aware input allocation and optimal communication plan generation. Gain insights into the potential for substantial reductions in embedding transmissions and notable performance enhancements in DLRM training across various network configurations.

Syllabus

NSDI '24 - Accelerating Neural Recommendation Training with Embedding Scheduling

Taught by

USENIX

Reviews

Start your review of Accelerating Neural Recommendation Training with Embedding Scheduling

Taught by

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Models

THC - Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

BGL - GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Never Stop Learning.