Overview
Syllabus
Intro
What is Apache Spark?
A Large Community
Apache Spark Users
Original Spark Vision
Motivation: Unification
Motivation: Concise API
How Did the Vision Hold Up?
Libraries Built on Spark
Which Libraries Do People Use?
Top Applications
Main Challenge: Functional API
Which API Call Causes Most Tickets?
Example Problem
Challenge: Data Representation
Why Structure?
DataFrames and Datasets
Execution Steps
DataFrame API
Why DataFrames?
What Structured APIs Enable
Performance
Dataset API Details
Data Sources
Data Source API
Examples
Hardware Trends
Project Tungsten
Tungsten's Compact Encoding
Space Efficiency
Runtime Code Generation
Long-Term Vision
Versioning in Spark
Major Features in 2.0
Background
Structured Streaming High-level streaming API built on DataFrames/Datasets
Structured Streaming API
Example: Batch Aggregation
Example: Continuous Aggregation
Incrementalized By Spark
Release Timeline
Conclusion
Want to Learn Apache Spark?
Taught by
Scala Days Conferences