Explore the architecture and resilience strategies of Cortex, a CNCF open-source project providing scalable, highly available, multi-tenant, long-term storage for Prometheus. Learn about key features that prevent or reduce failures, ensuring continuous metric flow. Discover the benefits of hash-ring and replication factor for crash tolerance, zone-aware replication for outage protection, tenant limits for cost and usage control, instance limits to prevent process overload, and shuffle sharding to minimize outage impact. Gain insights from Adobe and Amazon Web Services experts on implementing these features to run a robust multi-tenant Prometheus system in 2023.
Cortex - Running a Rock Solid Multi-Tenant Prometheus
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
Cortex: How to Run a Rock Solid Multi-Tenant Prometheus - Friedrich Gonzalez, Adobe & Alan Protasio
Taught by
CNCF [Cloud Native Computing Foundation]