Patterns and Operational Insights for Large-Scale Delta Lake Workloads

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore effective patterns and operational insights from early adopters of Delta Lake in this 42-minute conference talk. Discover how to handle demanding workloads over large volumes of log and telemetry data for cyber threat detection and response. Learn about streaming ETL, data enrichments, analytic workloads, and large materialized aggregates for fast answers. Dive into Z-ordering optimization techniques, including schema design considerations and the 32-column default limit. Understand the implications of date partitioning with long-tail distributions and unsynchronized clocks. Gain insights on optimization strategies, including when to use auto-optimize. Explore upsert patterns that simplify important jobs and learn how to tune Delta Lake for very large tables and low-latency access. Benefit from real-world experiences in operating large-scale workloads on Databricks and Delta Lake, covering topics such as the Parse Framework, merge operations, stateful processing, scaling, schema ordering, partitioning, and handling conflicting transactions.

Syllabus

Introduction
Parse Framework
Merge
Stateful Processing
Merged Tables
Scaling
Schema Ordering
Partitioning
Conflicting transactions
Metadata

Taught by

Databricks

Reviews

Start your review of Patterns and Operational Insights for Large-Scale Delta Lake Workloads

Taught by

Demystifying Delta Lake - Data Reliability for Data Lakes

How Delta Lake with Azure Databricks Can Accelerate Your Big Data Workloads by 100x

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Boost Delta Lake Performance with Data Skipping and Z-Order

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Building Lakehouses on Delta Lake with SQL Analytics

Never Stop Learning.