Overview
Discover how to leverage Kubernetes for running Apache Spark jobs in this 52-minute video presentation by Databricks. Learn about the challenges of managing Spark infrastructure and how Kubernetes offers a simplified approach to workload isolation, resource management, and on-demand deployment. Explore the benefits of unifying analytics and data science on a single cloud-native architecture, eliminating the need for separate big data clusters. Gain insights into different operational modes, the Spark Operator on Kubernetes, and strategies for automating DevOps processes. Delve into topics such as serverless Spark, functions, workflows, and microservice architecture. Understand how this integration enables more efficient ML pipelines and streamlined CICD processes. By the end of the talk, grasp the potential of combining Spark and Kubernetes to enhance your data processing capabilities and simplify your analytics infrastructure.
Syllabus
Intro
Challenges
ML Pipeline
Kubernetes
Cloud Native
Spark Over Kubernetes
Running Spark on Kubernetes
Different modes of operation
Spark Operator on Kubernetes
Recap
Repo
Challenges around Kubernetes
Automating devops
Serverless Spark
Functions
Workflows
Payoneer
CICD
Microservice Architecture
Serverless Architecture
MLRun
Summary
Taught by
Databricks