Watching the Watchers: How We Do Continuous Reliability at Grafana Labs

Overview

Learn about continuous reliability practices at Grafana Labs in this technical conference talk that reveals real-world challenges and solutions in maintaining observability tools. Explore how the company solved a costly mystery exceeding $100,000, successfully scaled Mimir clusters to handle 1.3 billion time series metrics, and optimized Loki clusters to process 324 TB of daily logs. Gain insights into the internal monitoring dashboards used for Grafana Cloud and discover valuable lessons learned from production incidents and system failures. Through candid discussions of past challenges and current improvements, understand the practical aspects of implementing observability at scale and maintaining reliability in complex microservices-based systems.

Syllabus

Watching the Watchers: How We Do Continuous Reliability at Grafana Labs - Nicole van der Hoeven

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Watching the Watchers: How We Do Continuous Reliability at Grafana Labs

Taught by

Cache Me If You Can: How Grafana Labs Scaled Up Their Memcached 42x and Improved Reliability

9 Best Microservices Courses for 2024: Scalability, Block by Block

Never Stop Learning.