Completed
Optimization for CPUs: distillation, quantization, and caching
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Deployment - FSDL 2022
Automatically move to the next video in the Classroom when playback concludes
- 1 Overview
- 2 First, deploy a prototype with gradio or streamlit
- 3 Model-in-server architecture
- 4 Model-in-database architecture
- 5 Model-as-a-service architecture
- 6 REST APIs for model services
- 7 Dependency management for model services
- 8 Containerization for model services with Docker
- 9 Performance optimization: to GPU or not to GPU?
- 10 Optimization for CPUs: distillation, quantization, and caching
- 11 Optimization for GPUs: Batching and GPU sharing
- 12 Libraries for model serving on GPUs
- 13 Horizontal scaling
- 14 Horizontal scaling with container orchestration k8s
- 15 Horizontal scaling with serverless services
- 16 Rollouts: shadows and canaries
- 17 Managed options for model serving AWS Sagemaker
- 18 Takeaways on model services
- 19 Moving to edge
- 20 Frameworks for edge deployment
- 21 Making efficient models for the edge
- 22 Mindsets and takeaways for edge deployment
- 23 Takeways for deploying ML models