ScaleCheck - A Single-Machine Approach for Discovering Scalability Bugs in Large Distributed Systems

Overview

Explore a conference talk on ScaleCheck, an innovative approach for discovering scalability bugs in large distributed systems using a single machine. Learn about the program analysis technique employed to identify potential causes of scalability issues and the colocation techniques used to test implementation code at real scales on a commodity PC. Discover how ScaleCheck has been integrated into popular storage systems like Cassandra, HDFS, Riak, and Voldemort, successfully exposing both known and unknown scalability bugs at scales up to 512 nodes on a 16-core PC. Gain insights into the methodology, including Naive Packing, Single Process Cluster, and Global Event Driven Architecture, as well as the concept of Colocation Factor. Understand the limitations and future work focused on scale-dependent CPU processing time.

Syllabus

Intro
ScaleCheck A Single Machine Approach for Discovering Scalability Bugs in Large Distributed Systems
An Example: Cassandra Bug #3831
The "Flapping" Bug(s)
Outline introduction
Naive Packing (NP)
Single Process Cluster (SPC) Deploy modes as processes threads in a single process
Per-Node Services Frequent Design pattern
Global Event Driven Architecture (GEDA) One global event handler per service
Finding New Bugs
Colocation Factor
Limitations and Future Work Focus on scale dependent CPUV Processing time

Taught by

USENIX

Reviews

Start your review of ScaleCheck - A Single-Machine Approach for Discovering Scalability Bugs in Large Distributed Systems

Taught by

Never Stop Learning.