Scaling to Millions of ML Models to Solve the Problems of SRE and Security
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore a conference talk on scaling machine learning models to improve SRE and security efficacy. Learn how to leverage millions of ML models operating on petabytes of operational and user data to enhance zero trust security frameworks and infrastructure diagnosis. Discover the challenges and solutions in implementing machine learning and anomaly detection on Kubernetes nodes and Envoy-based service mesh. Gain insights into collecting data from hundreds of thousands of nodes, handling high cardinality of models, and distributing inference models to K8s nodes. Understand the integration of open-source technologies like Kubernetes, Prometheus, Cortex, Apache Spark, and Apache Arrow in a production deployment. Delve into the complex architecture, infrastructure scaling, and Databricks integration. Examine code snippets and practical applications of the Group pandas API and Apache Arrow.
Syllabus
Introduction
Meet the speakers
Agenda
What is Volterra
Machine Learning Applications
Metrics and Locks
Complex Architecture
Infrastructure
Scaling
Spark and Kubernetes
Databricks Integration
Apache Arrow
Group pandas API
Code snippet
Conclusion
Taught by
CNCF [Cloud Native Computing Foundation]