Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building an Auto-scaling AI Inference Service - From Setup to Deployment

Trelis Research via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to build and deploy a scalable AI inference service in this comprehensive technical video tutorial. Master essential concepts from AI inference scaling fundamentals to advanced implementation techniques. Explore different inference approaches, understand GPU utilization patterns, and set up one-click templates for streamlined deployment. Dive into Docker image configuration, develop an auto-scaling service architecture, and optimize model configuration settings for peak performance. Practice load testing methodologies, analyze key metrics, and implement a robust scaling manager. Configure API endpoints for seamless integration while following industry best practices. Gain hands-on experience with practical examples and real-world scenarios throughout each module, concluding with insights into future developments and advanced topics in AI inference services.

Syllabus

- Introduction to AI Inference Scaling
- Video Agenda Overview
- Different Inference Approaches
- Understanding GPU Utilization
- Setting Up One-Click Templates
- Docker Image Configuration
- Building Auto-Scaling Service
- Model Configuration Settings
- Load Testing and Metrics
- Scaling Manager Implementation
- Setting Up API Endpoint
- Conclusion and Future Topics

Taught by

Trelis Research

Reviews

Start your review of Building an Auto-scaling AI Inference Service - From Setup to Deployment

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.