Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the innovative Exoshuffle system for large-scale data processing in this 26-minute conference talk from Anyscale. Delve into the world of shuffle, a crucial primitive in data processing applications, and discover how Exoshuffle challenges conventional wisdom by implementing high-performance, reliable shuffle on Ray, a general-purpose distributed computing system. Learn how Exoshuffle outperforms Spark and achieves an impressive 82% of theoretical performance on a 100TB sort using 100 nodes. Gain insights into the integration of Exoshuffle with Ray 2.0's Datasets library, providing enhanced large-scale shuffle capabilities for machine learning users. This talk offers valuable knowledge for data scientists, engineers, and anyone interested in advancing large-scale data processing techniques.