Cloud-Native Metric Monitoring with Prometheus

Overview

Explore the world of cloud-native metric monitoring with Prometheus in this comprehensive 37-minute talk by Richard Hartmann from Grafana Labs. Dive into the system used by millions to maintain critical application uptime, covering topics from basic concepts to advanced features. Learn about time series, alerting, and Prometheus' scalability. Discover related technologies like Loki for log aggregation and Tempo for distributed tracing. Gain insights into SRE practices, the importance of shared understanding in DevOps, and how to effectively transition from logs to metrics and traces for improved observability and cost savings.

Syllabus

Intro
Buzzword alert!
Complexity
Services
SRE, an instantiation of DevOps
Shared understanding
Alerting
Prometheus 101
Main selling points
Concepts & guarantuees
Time series
New features
Cloud native defaults
Prometheus scale
Loki 101
Loki @ Grafana Labs
Tempo @ Grafana Labs (2022-09)
Logs to metrics, the savings
From logs to traces