Observability in the MLOps Lifecycle with Prometheus

Overview

Explore a 24-minute conference talk from SREcon23 Asia/Pacific that delves into the crucial role of observability in the MLOps lifecycle using Prometheus. Begin with a gentle introduction to monitoring ML deployments, covering edge cases in production, data drift, concept drift, model metrics, and standard system and resource metrics. Gain an overview of observability and monitoring in the context of MLOps, understanding how monitoring can inform decisions about model retraining, data collection, and more. Learn how to leverage Prometheus for monitoring and performing essential tasks in MLOps, including methods to enhance existing deployments with powerful monitoring capabilities. Witness demonstrations of Prometheus integration with Flyte, Seldon Core, or FastAPI ML deployments, providing practical insights into implementing observability in real-world scenarios.

Syllabus

SREcon23 Asia/Pacific - Observability in the MLOps Lifecycle with Prometheus

Taught by

USENIX

Reviews

Start your review of Observability in the MLOps Lifecycle with Prometheus

Taught by

Prometheus in the MLOps Lifecycle

Beyond Observability - Aligning Technology Performance to Business Outcomes

Better Observability with No Code Changes

Mastering Chaos - Achieving Fault Tolerance with Observability-Driven Prioritized Load Shedding

Closing the Production Gap with MLOps

Towards Zero Carbon - Implementing Sustainable Battery Lifecycle Management in Data Centers

Never Stop Learning.