Explore the innovative "Shuffle Sharding" technique implemented in Cortex, a horizontally-scalable, highly-available, and multi-tenant Prometheus-compatible time series database. Discover how this approach enhances scalability and isolation in large-scale deployments. Learn about the challenges of traditional Dynamo-style replication and how shuffle sharding addresses them by automatically selecting random "replica sets" for each tenant. Gain insights into the design principles behind shuffle sharding on both read and write paths of Cortex. Witness a live demonstration showcasing the resilience of shuffle sharding, illustrating how multiple replicas can be taken out without affecting all tenants. Delve into the theory and practical applications of this technique, understanding its impact on improving overall system reliability and performance in distributed environments.
Better Scalability and More Isolation? The Cortex "Shuffle Sharding" Story
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
Intro
Overview
Scaling Prometheus
What is Cortex
Timeline
Scalability
Shuffle Sharding
Shuffle Shard Algorithm
Results
Recap
Taught by
CNCF [Cloud Native Computing Foundation]