Overview
Syllabus
Intro
Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
Designing (MPX) Programming Models at Exascale
Overview of the MVAPICH2 Project
MVAPICH2 Release Timeline and Downloads
Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science
Highlights of MVAPICH2 2.3.6-GA Release
Startup Performance on TACC Frontera
Performance of Collectives with SHARP on TACC Frontera
Performance Engineering Applications using MVAPICH2 and TAU
Overview of Some of the MVAPICH2-X Features
Impact of DC Transport Protocol on Neuron
Cooperative Rendezvous Protocols
Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand
Shared Address Space (XPMEM)-based Collectives Design
MVAPICH2-GDR 2.3.6
Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science
MVAPICH2-GDR with CUDA-aware MPI Support
Performance with On-the-fly Compression Support in MVAPICH2-GDR
Collectives Performance on DGX2-A100 - Small Message
MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training
Distributed TensorFlow on ORNL Summit 1,536 GPUS
Distributed TensorFlow on TACC Frontera (2048 CPU nodes)
PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs
Dask Architecture
Benchmark #1: Sum of cupy Array and its Transpose (12)
Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)
MVAPICH2-GDR Upcoming Features for HPC and DL
Funding Acknowledgments
Taught by
Linux Foundation