Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

NetBouncer - Active Device and Link Failure Localization in Data Center Networks

USENIX via YouTube

Overview

Explore a comprehensive presentation on NetBouncer, an active failure localization system for data center networks. Learn how this innovative solution leverages IP-in-IP techniques to detect both device and link failures, ensuring high availability of data center services. Discover the challenges of accurately localizing failures among millions of servers and network devices, and understand how NetBouncer's algorithm integrates troubleshooting domain knowledge with machine learning to overcome real-world data inconsistencies. Gain insights into the system's deployment in Microsoft Azure's data centers, its performance in detecting spine router gray failures, and its negligible overheads on the server side. Delve into the intricacies of active probing, path selection, device failure detection, and link failure inference as you examine this robust framework for maintaining data center network reliability.

Syllabus

Intro
This is a true story
Active probing system requires explicit and efficient probing
Observation vs. inference from path probing to failures
Real-world constraints complicate path selection
Device failure detection
Link failure inference: an optimization problem
Real world data inconsistency induces false positives
Evaluation questions
Real cases spine router gray failure
Accuracy comparison with previous algorithms
NetBouncer algorithm performance
NetBouncer has negligible averheads on the server side

Taught by

USENIX

Reviews

Start your review of NetBouncer - Active Device and Link Failure Localization in Data Center Networks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.