Automatic Generation of Runtime Checkers for Production Distributed Systems

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore three systematic techniques for automatically generating effective, customized runtime checkers for large distributed systems in this 40-minute Strange Loop Conference talk. Learn about Panorama's approach to capturing in-situ observability, a program reduction method for identifying long-running regions and inserting watchdog hooks, and Oathkeeper's strategy for detecting silent semantic violations. Discover how these techniques can help detect and localize unexpected subtle failures in complex production environments, improving the reliability and availability of modern distributed systems. Gain insights from real-world failure studies and performance evaluations presented by Ryan Huang, an Assistant Professor at Johns Hopkins University specializing in computer systems research.

Syllabus

Intro
Runtime checker (aka. detector/monitor)
Importance of runtime checker
Current checking practice
Complex internals of modern software
Common to exhibit gray failures
A real-world gray failure
Failure root cause
Ideal runtime checkers
A new approach
Panorama: capture in-situ observability
Convert a program into in-situ observer
Identify observation boundary and identities
Extract evidence
Example of analysis
Detecting real-world gray failures
Timeline of detecting failure case f1
Latency overhead to observers
Program reduction approach
Why doing reduction?
identify long-running regions
select checking target candidates
reduce long-running methods
encapsulate checkers
insert watchdog hooks
Prevent side effects
Watchdog generation
Failure detection evaluation setup
Detecting real-world failures
Silent semantic violations
Real-world failure study
Oathkeeper: detect silent semantic violation
How to express semantics?
Oathkeeper workflow
Emitting semantic event traces
General semantic rule templates
Extracted semantic rules
Runtime overhead
Conclusions

Taught by

Strange Loop Conference

Reviews

Start your review of Automatic Generation of Runtime Checkers for Production Distributed Systems

Taught by

Tags

Never Stop Learning.