Building an Auto-scaling AI Inference Service - From Setup to Deployment

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Learn to build and deploy a scalable AI inference service in this comprehensive technical video tutorial. Master essential concepts from AI inference scaling fundamentals to advanced implementation techniques. Explore different inference approaches, understand GPU utilization patterns, and set up one-click templates for streamlined deployment. Dive into Docker image configuration, develop an auto-scaling service architecture, and optimize model configuration settings for peak performance. Practice load testing methodologies, analyze key metrics, and implement a robust scaling manager. Configure API endpoints for seamless integration while following industry best practices. Gain hands-on experience with practical examples and real-world scenarios throughout each module, concluding with insights into future developments and advanced topics in AI inference services.

Syllabus

- Introduction to AI Inference Scaling
- Video Agenda Overview
- Different Inference Approaches
- Understanding GPU Utilization
- Setting Up One-Click Templates
- Docker Image Configuration
- Building Auto-Scaling Service
- Model Configuration Settings
- Load Testing and Metrics
- Scaling Manager Implementation
- Setting Up API Endpoint
- Conclusion and Future Topics

Taught by

Trelis Research

Reviews

Start your review of Building an Auto-scaling AI Inference Service - From Setup to Deployment

Taught by

Serverless Machine Learning Inference with KFServing

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Containerize Your Applications: 10 Best Docker Courses for 2024

Never Stop Learning.