AI Inference Workloads - Solving MLOps Challenges in Production

AI Inference Workloads - Solving MLOps Challenges in Production

Toronto Machine Learning Series (TMLS) via YouTube Direct link

Decouple Web Serving and Model Serving

10 of 17

10 of 17

Decouple Web Serving and Model Serving

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

AI Inference Workloads - Solving MLOps Challenges in Production

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Agenda
  3. 3 The Machine Learning Process
  4. 4 Deployment Types for Inference Workloads
  5. 5 Machine Learning is Different than Traditional Software Engineering
  6. 6 Low Latency
  7. 7 High Throughput
  8. 8 Maximize GPU Utilization
  9. 9 Embedding ML. Models into Web Servers
  10. 10 Decouple Web Serving and Model Serving
  11. 11 Model Serving System on Kubernetes
  12. 12 Multi-Instance GPU (MIG)
  13. 13 Run:Al's Dynamic MIG Allocations
  14. 14 Run 3 instances of type 2g.10gb
  15. 15 Valid Profiles & Configurations
  16. 16 Serving on Fractional GPUs
  17. 17 A Game Changer for Model Inferencing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.