Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking approach to accelerating distributed Deep Learning Recommendation Model (DLRM) training without compromising model accuracy. Delve into the concept of embedding scheduling, which proactively determines optimal embedding training locations and synchronization strategies. Learn about Herald, a real-time embedding scheduler designed to increase cache hit rates and decrease unnecessary updates, significantly reducing communication overhead. Discover how this innovative method leverages the predictability and infrequency of in-cache embedding accesses in distributed training systems. Examine the performance improvements achieved through adaptive location-aware input allocation and optimal communication plan generation. Gain insights into the potential for substantial reductions in embedding transmissions and notable performance enhancements in DLRM training across various network configurations.