Explore a conference talk that delves into the challenges of scaling a centralized multi-tenant workflow orchestration platform at Bloomberg. Learn how the engineering team tackled rapid growth, managing spiky submission rates, unpredictable resource requests, and heavy API server demand. Discover the complexities of implementing throttling and garbage collection in a diverse user environment. Gain insights into the unique challenges of maintaining a global Argo installation and fulfilling varied requirements within a single cluster. Examine potential impacts of these scaling issues and uncover generalizable guardrails and mitigation strategies applicable to other batch workloads.
Growing Pain: Scaling to 10K Workflows per Week
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
Growing Pain: Scaling to 10K Workflows per Week - Yao Lin & Harris Khan, Bloomberg
Taught by
CNCF [Cloud Native Computing Foundation]