Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Deep Dive into New Features of Apache Spark 3.1

Databricks via YouTube

Overview

Explore the latest advancements in Apache Spark 3.1 through this comprehensive 49-minute Databricks video. Dive deep into over 1500 resolved JIRAs, focusing on key improvements that make Spark faster, easier, and smarter. Learn about crucial SQL features for ANSI compliance, innovative streaming capabilities, and Python usability enhancements. Discover performance optimizations and new tuning techniques in the query compiler. Gain insights into upcoming major initiatives and future developments. Through examples and demos, understand important changes such as ANSI SQL mode, unified CREATE TABLE syntax, CHAR/VARCHAR support, node decommissioning, shuffle hash join improvements, partition pruning, predicate pushdown, and reduced query compiling latency. Explore advancements in stream-stream joins, state store for Structured Streaming, PySpark type hints, static error detection, Python dependency management, and new utility functions for Unix time and time zones. Familiarize yourself with usability enhancements, documentation updates, and important deprecations and removals in this essential update for Spark developers and data professionals.

Syllabus

Intro
ANSI SOL Compliance
Fail Earlier for Invalid Data
Forbid Confusing CAST
ANSI Mode GA in Spark 3.2
Unified CREATE TABLE SOL Syntax
CHAR/VARCHAR Support
More ANSI Features Coming in Spark 3.2!
Node Decommissioning
Summary
SOL Performance
Shuffle Hash Join Improvement
Partition Pruning Improvement
Predicate Pushdown Improvement
Reduce Query Compiling Latency (3.2)
Stream-stream Join
State Store for Structured Streaming
Rocks DB State Store
Add the type hints PEP 484 to PySpark!
Static Error Detection
Python Dependency Management
Visualization and Plotting
Usability Enhancements
New Utility Functions for Unix Time
New Utility Functions for Time Zone
EXPLAIN FORMMATTED
Ignore Hints
Documentation and Environments
New Doc for PySpark
Deprecations and Removals

Taught by

Databricks

Reviews

Start your review of Deep Dive into New Features of Apache Spark 3.1

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.