Creating a Custom Serving Runtime in KServe ModelMesh - Hands-On Experience

Overview

Explore the process of creating a custom serving runtime in KServe ModelMesh to serve machine learning models in this 30-minute conference talk. Gain insights into ModelMesh key features, learn how to build a new container image supporting desired frameworks, and understand the deployment strategy. Discover the advantages of KServe and ModelMesh architecture, including monitoring capabilities with Prometheus and Grafana dashboards. Follow along with hands-on demonstrations of loading models in existing model servers and running predictions using custom serving runtimes. Delve into practical examples and step-by-step instructions for implementing ModelMesh in real-world scenarios.

Syllabus

Introduction
Agenda
What is ModelServing
Deployment Strategy
KServe
Pod Per Model
ModelMesh
ModelMesh Features
ModelMesh Architecture
Monitoring
Prometheus
Grafana Dashboard
Model Loading
Serving Runtime
Why KServe
Step by Step
Example
Model Mesh Example
In Practice