Serverless Machine Learning Inference with KFServing
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
Intro
Inference Stack Evolution PYTORCH
Model explanation, model pre-post transformers
GPU Autoscaling the challenge
Challenge: Increase GPU utilization
Use Case: Personalized News Monitoring
Challenge: Deploy many models
Proposed Solution: Multi-model Inference Service
Experience from running a serverless inference platform
Reduce tail latency caused by CPU throttling
Reduce cold start latency
Monitoring and Alerting: Control Plane
Monitoring and Alerting: Access logs
Monitoring and Alerting: Inference Service metrics
KFServing Roadmap 2020
Our Working Group is Open
Taught by
CNCF [Cloud Native Computing Foundation]