Production ML Serving and Monitoring with Kubernetes

Overview

Explore a practical guide to building a cutting-edge MLOps deployment platform using Kubernetes, with a focus on deploying deep learning models. Gain valuable insights into integrating key technologies such as NVIDIA Triton Inference Server, Seldon Core v2, Kafka, Prometheus, and Grafana. Learn about an end-to-end workflow for serving complex models like transformers and CNNs, as well as configuring monitoring systems. Discover strategies to enhance model performance, reduce costs, and unlock new use cases in machine learning. This 42-minute conference talk, presented by Andrew Willson, Head of Customer Success at Seldon, at MLOps World: Machine Learning in Production, provides essential knowledge for professionals looking to advance their MLOps capabilities.