Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
- 3 Designing (MPX) Programming Models at Exascale
- 4 Overview of the MVAPICH2 Project
- 5 MVAPICH2 Release Timeline and Downloads
- 6 Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science
- 7 Highlights of MVAPICH2 2.3.6-GA Release
- 8 Startup Performance on TACC Frontera
- 9 Performance of Collectives with SHARP on TACC Frontera
- 10 Performance Engineering Applications using MVAPICH2 and TAU
- 11 Overview of Some of the MVAPICH2-X Features
- 12 Impact of DC Transport Protocol on Neuron
- 13 Cooperative Rendezvous Protocols
- 14 Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand
- 15 Shared Address Space (XPMEM)-based Collectives Design
- 16 MVAPICH2-GDR 2.3.6
- 17 Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science
- 18 MVAPICH2-GDR with CUDA-aware MPI Support
- 19 Performance with On-the-fly Compression Support in MVAPICH2-GDR
- 20 Collectives Performance on DGX2-A100 - Small Message
- 21 MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training
- 22 Distributed TensorFlow on ORNL Summit 1,536 GPUS
- 23 Distributed TensorFlow on TACC Frontera (2048 CPU nodes)
- 24 PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs
- 25 Dask Architecture
- 26 Benchmark #1: Sum of cupy Array and its Transpose (12)
- 27 Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)
- 28 MVAPICH2-GDR Upcoming Features for HPC and DL
- 29 Funding Acknowledgments