Learn about the optimization efforts for Open MPI and libfabric on Oak Ridge National Labs' Frontier supercomputer in this technical presentation. Explore the architecture of Frontier nodes, featuring AMD MI250X GPUs with 8 Graphics Compute Dies and HPE SLINGSHOT NICs providing impressive bandwidth capabilities. Discover the collaborative work between the Exascale Computing Project and Oak Ridge Leadership Computing Facility to enhance the integration of CXI providers, develop the new LINKx libfabric provider, and implement various optimizations. Understand key developments including shared memory operations, IPC communication functionality for ROCm, XPMEM support, and infrastructure improvements aimed at matching the HPE MPICH software stack's performance. Gain insights into how these enhancements provide application developers with a robust alternative to the HPE stack while leveraging Frontier's advanced hardware capabilities.
Overview
Syllabus
Open MPI and Libfabric on the Frontier Supercomputer
Taught by
OpenFabrics Alliance