Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Spike Detection in Alert Correlation at LinkedIn

USENIX via YouTube

Overview

Explore spike detection techniques for alert correlation in large-scale microservices architectures through this SREcon21 conference talk. Delve into LinkedIn's approach to identifying root causes during production outages amidst thousands of interconnected services. Learn about the challenges of distinguishing genuine issues from false positives in a complex alert landscape. Discover how LinkedIn implemented anomaly detection using Modified Z-Score and Median Absolute Deviation (MAD) to streamline their alert correlation system. Gain insights into practical applications, challenges faced, and results achieved in reducing false escalations and minimizing issue resolution time. Understand the nuances of correlation versus causation in the context of microservices monitoring and troubleshooting.

Syllabus

Intro
Background: Quick 1 Introduction of Linkedin Stack
Linkedin Stack Under the hood
Finding Needle in a haystack
Alert Correlation A framework that automates the alert correlation process to identity unhealthy microservices
Alert Correlation Slack Recommendations
A Real Issue
A Spike
Correlation does not mean Causation
Problem Statement: Finding the "right" needle in a needlestack
Modified Z-Score For Outlier Detection
MAD (Median Absolute Deviation)
A Simple Example
Spike Detection Challenges
Results: Spike vs Real

Taught by

USENIX

Reviews

Start your review of Spike Detection in Alert Correlation at LinkedIn

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.