Overview
Syllabus
Intro
databricks
Deep Dive into the New Features of Apache Spark 3.0
A Delta Lake 0.7.0 + Spark 3.0 AMA
Spark Catalyst Optimizer
Adaptive Query Execution AQE
Apache SparkTM 3.0 AQE Fundamentals
Starting with Broadcast Hash Joins
Dynamically Switching Join Strategies Apache Spark 3.0 NE Fundamentals
Dynamically Coalescing Shuffle Partitions Apache Spark 3.0 ADÉ Fundamentals
Dynamically Optimize Skew Joins
TPC-DS performance gains from AQE
Dynamic Partition Pruning: Before Optimiza
How to Use Join Hints? Broadcast Hash Join
Extensibility and Ecosystem
Data Source V2
But what happens with DML under the cover What really happens to the file system when you run delete update and merge?
Time Travel The transaction log and additive files - data versioning
Control Table History Retention
Enable DataSourceV2 and Catalog API Integration
Data Quality Framework Improved SOL DOL and DMLS and ACID Transactions are just the start
Lakehouse Paradigm Improved Performance. DW-like capabilities, on low cost cloud object stores
Try out Spark 3.0 + Delta Lake now!
Taught by
Databricks