Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Linux Foundation via YouTube Direct link

Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science

6 of 29

6 of 29

Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
  3. 3 Designing (MPX) Programming Models at Exascale
  4. 4 Overview of the MVAPICH2 Project
  5. 5 MVAPICH2 Release Timeline and Downloads
  6. 6 Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science
  7. 7 Highlights of MVAPICH2 2.3.6-GA Release
  8. 8 Startup Performance on TACC Frontera
  9. 9 Performance of Collectives with SHARP on TACC Frontera
  10. 10 Performance Engineering Applications using MVAPICH2 and TAU
  11. 11 Overview of Some of the MVAPICH2-X Features
  12. 12 Impact of DC Transport Protocol on Neuron
  13. 13 Cooperative Rendezvous Protocols
  14. 14 Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand
  15. 15 Shared Address Space (XPMEM)-based Collectives Design
  16. 16 MVAPICH2-GDR 2.3.6
  17. 17 Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science
  18. 18 MVAPICH2-GDR with CUDA-aware MPI Support
  19. 19 Performance with On-the-fly Compression Support in MVAPICH2-GDR
  20. 20 Collectives Performance on DGX2-A100 - Small Message
  21. 21 MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training
  22. 22 Distributed TensorFlow on ORNL Summit 1,536 GPUS
  23. 23 Distributed TensorFlow on TACC Frontera (2048 CPU nodes)
  24. 24 PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs
  25. 25 Dask Architecture
  26. 26 Benchmark #1: Sum of cupy Array and its Transpose (12)
  27. 27 Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)
  28. 28 MVAPICH2-GDR Upcoming Features for HPC and DL
  29. 29 Funding Acknowledgments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.