Explore a comprehensive approach to managing Kubernetes cluster metrics and alerts in this 46-minute conference talk by Anusha Ragunathan and Sahil Badla from Intuit Inc. Learn how to apply the industry-standard "Golden Signals" concept to Kubernetes clusters, effectively reducing alert fatigue for platform engineers and SREs. Discover the architecture and components of a successful metrics pipeline that derives baseline behaviors and detects anomalies. Through a simulated incident demonstration, understand how cluster golden signals can differentiate between service and platform issues, enabling efficient incident isolation and remediation. Gain valuable insights and best practices for implementing this system at scale, based on real-world production experience.
Overview
Syllabus
Cluster Golden Signals to Avoid Alert Fatigue at Scale - Anusha Ragunathan & Sahil Badla, Intuit Inc
Taught by
Linux Foundation