Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks via YouTube

Overview

Discover the latest advancements in big data processing during this Seattle Spark + AI Meetup video. Learn about performance improvements in Apache Spark 3.0, including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries. Explore how Delta Lake enhances data lake reliability with ACID transactions, Schema Enforcement, and Time Travel. Gain insights into the new AQE framework's query performance gains, with examples from a 3TB TPC-DS benchmark. Understand how DPP speeds up performance by pruning partitions in star schema designs. Delve into topics such as Spark Catalyst Optimizer, logical and physical planning, broadcast hash joins, and coalescing. Examine the traditional data warehousing problem and learn about split partitioning. Discover the Data Lake Reliability features, including Catalog APIs, SQL statement support, and partial rights. Explore the Data Quality Framework and improved performance in Delta Lake. This comprehensive presentation covers essential aspects of Apache Spark 3.0 and Delta Lake, providing valuable knowledge for big data professionals and enthusiasts.

Syllabus

Introduction
Who is Danny
Free Download
Databricks
Download the book
Adaptive Query Execution
Apache Spark 30
Performance
Spark Catalyst Optimizer
Logical Physical Planning
Aqe Fundamentals
Broadcast Hash Joins
Why not always broadcast join
Dynamically switch join strategies
Flipping the switch
Off script partitioning
Coalescence
Table Size
Coalescing
Traditional Data Warehousing Problem
Split Partitioning
QA Questions
Dynamic Partition Pruning
Dynamic Partition Pruning Before Optimization
Filter Scan
Results
Pseudo Rush
Building Ecosystem
Data Lake Reliability
Catalog API
SQL Statement Support
Partial Rights
Delete
Delete from Events
History Retention
Data Source v2 Catalog API
Data Quality Framework
Improved Performance
More About Delta

Taught by

Databricks

Reviews

Start your review of How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.