Overview
Syllabus
Overview
First, deploy a prototype with gradio or streamlit
Model-in-server architecture
Model-in-database architecture
Model-as-a-service architecture
REST APIs for model services
Dependency management for model services
Containerization for model services with Docker
Performance optimization: to GPU or not to GPU?
Optimization for CPUs: distillation, quantization, and caching
Optimization for GPUs: Batching and GPU sharing
Libraries for model serving on GPUs
Horizontal scaling
Horizontal scaling with container orchestration k8s
Horizontal scaling with serverless services
Rollouts: shadows and canaries
Managed options for model serving AWS Sagemaker
Takeaways on model services
Moving to edge
Frameworks for edge deployment
Making efficient models for the edge
Mindsets and takeaways for edge deployment
Takeways for deploying ML models
Taught by
The Full Stack