Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore effective patterns and operational insights from early adopters of Delta Lake in this 42-minute conference talk. Discover how to handle demanding workloads over large volumes of log and telemetry data for cyber threat detection and response. Learn about streaming ETL, data enrichments, analytic workloads, and large materialized aggregates for fast answers. Dive into Z-ordering optimization techniques, including schema design considerations and the 32-column default limit. Understand the implications of date partitioning with long-tail distributions and unsynchronized clocks. Gain insights on optimization strategies, including when to use auto-optimize. Explore upsert patterns that simplify important jobs and learn how to tune Delta Lake for very large tables and low-latency access. Benefit from real-world experiences in operating large-scale workloads on Databricks and Delta Lake, covering topics such as the Parse Framework, merge operations, stateful processing, scaling, schema ordering, partitioning, and handling conflicting transactions.
Syllabus
Introduction
Parse Framework
Merge
Stateful Processing
Merged Tables
Scaling
Schema Ordering
Partitioning
Conflicting transactions
Metadata
Taught by
Databricks