Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Watch a technical presentation from ByteDance and Broadcom architects exploring how Scheduled Ethernet Fabric technology optimizes large-scale AI training clusters. Discover the architecture behind connecting tens of thousands of GPUs efficiently, with detailed insights into achieving extensive GPU scale-out, managing diverse parallel workloads through multi-tenancy, and implementing resilient networking against failures. Learn from ByteDance's real-world benchmarking results and deployment experiences with this fabric technology, while gaining perspective on the importance of open ecosystems for continued innovation in AI infrastructure. Understand key requirements for high-performance network fabrics that maximize computational power across massive GPU clusters handling various AI workloads.
Syllabus
Scheduled Ethernet Fabric for Large scale AI training cluster
Taught by
Open Compute Project