Is Sharing GPU to Multiple Containers Feasible?

Overview

Explore the feasibility of sharing GPUs across multiple containers in this 27-minute conference talk from the Cloud Native Computing Foundation (CNCF). Dive into cost-effective strategies for maximizing GPU utilization in ML workloads, particularly for inference tasks. Learn how to provision and attach GPUs using Kubernetes device plugins, and discover techniques for extending the NVIDIA device plugin to schedule multiple ML workloads on a single GPU. Gain insights into collecting GPU information with Prometheus and understand the native GPU sharing capabilities in Kubernetes without relying on additional technologies like VMware's vGPUs.