Running Apache Spark Jobs Using Kubernetes

Overview

Discover how to leverage Kubernetes for running Apache Spark jobs in this 52-minute video presentation by Databricks. Learn about the challenges of managing Spark infrastructure and how Kubernetes offers a simplified approach to workload isolation, resource management, and on-demand deployment. Explore the benefits of unifying analytics and data science on a single cloud-native architecture, eliminating the need for separate big data clusters. Gain insights into different operational modes, the Spark Operator on Kubernetes, and strategies for automating DevOps processes. Delve into topics such as serverless Spark, functions, workflows, and microservice architecture. Understand how this integration enables more efficient ML pipelines and streamlined CICD processes. By the end of the talk, grasp the potential of combining Spark and Kubernetes to enhance your data processing capabilities and simplify your analytics infrastructure.

Syllabus

Intro
Challenges
ML Pipeline
Kubernetes
Cloud Native
Spark Over Kubernetes
Running Spark on Kubernetes
Different modes of operation
Spark Operator on Kubernetes
Recap
Repo
Challenges around Kubernetes
Automating devops
Serverless Spark
Functions
Workflows
Payoneer
CICD
Microservice Architecture
Serverless Architecture
MLRun
Summary