Optimizing Knowledge Distillation Training With Volcano

Overview

Explore a conference talk on optimizing knowledge distillation training using Volcano. Delve into the innovative approach of leveraging Volcano as a scheduler to deploy Teacher models in online Kubernetes GPU inference card clusters, enhancing the throughput of knowledge distillation processes. Learn how this method allows for flexible scheduling, mitigating task failures during peak hours and maximizing the use of cluster resources. Discover the detailed process of optimizing elastic distillation training with Volcano, complete with benchmark data. Gain insights into large-scale training, Elastic Deep Learning, and the advantages of this approach. Examine the Volcano architecture, GPU sharing techniques, and its integration with Kubernetes for efficient model compression and deployment.

Syllabus

Introduction
Project Background
Large Scale Training
Elastic Deep Learning
Knowledge Distillation
Advantages
Training Vector
William Wang
Challenges
CNCF Sandbox
Volcano Architecture
Survival Kubernetes
Volcano Job
GPU Sharing
Cromwell
Commander
Kubernetes