Building Armada - Running Batch Jobs at Massive Scale on Kubernetes

Overview

Discover how G-Research designed and built Armada, a system enabling massive throughput of batch jobs on Kubernetes, in this informative conference talk. Explore the architecture and approach behind Armada, which schedules millions of batch jobs across multiple clusters and tens of thousands of nodes. Learn about the challenges and techniques for running Kubernetes at scale, and gain insights into optimizing hardware utilization for advanced machine learning and data science applications. Delve into core concepts, user access, cluster anatomy, scaling Kubernetes, and security considerations. Benefit from war stories and lessons learned, and understand the roadmap for Armada's future development. Gain valuable knowledge on leveraging large-scale batch compute on Kubernetes for spotting patterns in financial markets and predicting future trends.

Syllabus

Introduction
What is Armada
How we use Armada
Core Concepts
User Access
Architecture
Cluster Anatomy
Scaling Kubernetes
Security
Challenges
Successes
Roadmap
How to use Armada
Questions