Spark on Kubernetes - Best Practice and Performance
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore best practices and performance optimization techniques for running Apache Spark on Kubernetes in this 39-minute conference talk by Junjie Chen and Jerry Shao from Tencent. Learn about deploying Spark as a public cloud service using Kubernetes, covering topics such as authorization, logging, and multi-tenancy management. Discover performance tuning strategies for maximizing resource utilization, including detailed configuration adjustments for both Kubernetes and Spark. Gain insights into achieving high availability through Zookeeper integration and understand the performance impact of various configurations using TPC-DS workload benchmarks. Delve into the architecture, applications, storage services, and environments involved in Spark on Kubernetes deployments, and benefit from the speakers' real-world experiences and practical advice for optimizing big data services on containerized platforms.
Syllabus
Introduction
What is Spark
Why do we need Kubernetes
Architecture
Spark Application
Spark on accumulated status
Applications
Storage
Service
Structure
HDFS
Catalog
Highs
Environments
Benchmark Configuration
Benchmark Results
Data Locality
Our Experience
Summary
Taught by
CNCF [Cloud Native Computing Foundation]