Better Scalability and More Isolation? The Cortex "Shuffle Sharding" Story

Overview

Explore the innovative "Shuffle Sharding" technique implemented in Cortex, a horizontally-scalable, highly-available, and multi-tenant Prometheus-compatible time series database. Discover how this approach enhances scalability and isolation in large-scale deployments. Learn about the challenges of traditional Dynamo-style replication and how shuffle sharding addresses them by automatically selecting random "replica sets" for each tenant. Gain insights into the design principles behind shuffle sharding on both read and write paths of Cortex. Witness a live demonstration showcasing the resilience of shuffle sharding, illustrating how multiple replicas can be taken out without affecting all tenants. Delve into the theory and practical applications of this technique, understanding its impact on improving overall system reliability and performance in distributed environments.

Syllabus

Intro
Overview
Scaling Prometheus
What is Cortex
Timeline
Scalability
Shuffle Sharding
Shuffle Shard Algorithm
Results
Recap

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Better Scalability and More Isolation? The Cortex "Shuffle Sharding" Story

Taught by

Never Stop Learning.