Overview
Watch a 16-minute conference talk from USENIX NSDI '22 exploring SpiderMon, a novel system for closed-loop network performance monitoring and diagnosis. Learn how SpiderMon leverages "wait-for" relations to achieve low overhead and high coverage simultaneously, addressing limitations of existing query-driven and blanket monitoring approaches. Discover the system's key components, including the Always-on Monitor, Diagnosis Trigger, Timeout Filter, and Purple Traffic Meter. Understand how SpiderMon aligns telemetry data and utilizes Wait-For Graphs to accurately and quickly diagnose performance problems in data center networks. Gain insights into the system's evaluation, complexity, and potential impact on network management practices.
Syllabus
Introduction
Observations
Root causes
Existing solutions
SpiderMon
Alwayson Monitor
Diagnosis
Trigger
Timeout Filter
Timeout Filter Example
Purple Traffic Meter
Telemetry Data
Aligning Data
Wait For Graph
Calculate Degree
Evaluation
Complexity
Summary
Taught by
USENIX