Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to build and deploy a scalable AI inference service in this comprehensive technical video tutorial. Master essential concepts from AI inference scaling fundamentals to advanced implementation techniques. Explore different inference approaches, understand GPU utilization patterns, and set up one-click templates for streamlined deployment. Dive into Docker image configuration, develop an auto-scaling service architecture, and optimize model configuration settings for peak performance. Practice load testing methodologies, analyze key metrics, and implement a robust scaling manager. Configure API endpoints for seamless integration while following industry best practices. Gain hands-on experience with practical examples and real-world scenarios throughout each module, concluding with insights into future developments and advanced topics in AI inference services.
Syllabus
- Introduction to AI Inference Scaling
- Video Agenda Overview
- Different Inference Approaches
- Understanding GPU Utilization
- Setting Up One-Click Templates
- Docker Image Configuration
- Building Auto-Scaling Service
- Model Configuration Settings
- Load Testing and Metrics
- Scaling Manager Implementation
- Setting Up API Endpoint
- Conclusion and Future Topics
Taught by
Trelis Research