Machine Learning Using Various GPU Technologies with Kubeflow
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore advanced GPU technologies for efficient machine learning in this 32-minute conference talk by Jihye Choi from SAMSUNG SDS. Discover how to optimize GPU utilization and enhance distributed learning in Kubeflow environments. Learn about Multi-Instance GPU technology for the NVIDIA A100, which allows splitting a single GPU into up to 7 instances, maximizing resource efficiency for simplified models. Delve into the benefits of GPUDirect RDMA, a high-performance networking technology that enables direct GPU memory communication without CPU intervention, improving GPU utilization and performance in distributed training scenarios. Gain valuable insights on combining these cutting-edge technologies with Kubeflow to overcome limitations in cost and GPU resources for MLOps practitioners.
Syllabus
Machine Learning Using Various GPU Technology With Kubeflow. - Jihye Choi, SAMSUNG SDS
Taught by
CNCF [Cloud Native Computing Foundation]