Unlocking New Pose in HPC - Containerization, Cloud, and GPU-based Workloads
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore cutting-edge advancements in High-Performance Computing (HPC) through this informative conference talk. Delve into the integration of containerization, cloud technologies, and GPU-based workloads in HPC environments. Learn how Kubernetes enables unified management of heterogeneous computing, network, and storage resources. Discover techniques for GPU virtualization, creating shared resource pools for fine-grained quota management and multi-tenant sharing. Understand the implementation of custom Kubernetes schedulers for prioritized GPU task management. Gain insights into GPU visual monitoring using Prometheus, offering aggregated performance metrics and granular monitoring capabilities. Explore the creation of a one-stop scientist workbench for end-to-end algorithm development, model training, and AI service deployment in the cloud using Kubernetes-scheduled mainstream artificial intelligence frameworks.
Syllabus
Unlocking New Pose in HPC—Containerization, Cloud, and GPU-based Workloads- Ying Xu & Xianglong Zeng
Taught by
CNCF [Cloud Native Computing Foundation]