Overview
Explore a 30-minute talk on optimizing machine learning deployment using Apache TVM and OctoML Platform. Learn about graph- and operator-level optimizations for performance portability across diverse hardware back-ends. Discover how TVM's learning-based approach rapidly explores optimizations, saving engineering time and delivering top performance for edge and server use cases. Gain insights into TVM's broad model coverage and efficient hardware resource utilization. Get a preview of OctoML's Octomizer, a SaaS platform for continuous model optimization, benchmarking, and packaging. Understand the challenges of ML deployment in diverse hardware environments and how TVM and OctoML address the exploding ecosystem of ML workloads and hardware capabilities.
Syllabus
Intro
Machine Learning is hard and costly to deploy
Trend: ML workload diversity is exploding
Trend: ML hardware capabilities exploding
An exploding ecosystem makes ML deployment difficult
ML-based optimizations
Why use Apache TVM?
TVM: Getting Optimal Performance
OctoML's Broad HW & Model Architecture Coverage Octomizer supports any model architecture with standard operators see lists here and here
Thank you Apache TVM community! 615+!
Taught by
Open Data Science