Learn how Apache Iceberg manages deleted rows and optimizes Change Data Capture (CDC) performance in this lightning talk. Explore the challenges of ingesting and maintaining CDC streams from transactional databases to an Iceberg lakehouse, focusing on performance degradation issues as change frequency and volume increase. Discover the distinctions between position and equality delete files, and understand how recent Presto enhancements optimize Merge on Read (MoR) with equality deletes through join operations, resulting in query performance improvements of up to 400X. Gain insights into the trade-offs between Copy on Write (CoW) vs. MoR, file size considerations, and table refresh timing strategies.
Overview
Syllabus
How we accelerated our Iceberg queries for CDC with MoR and Equality Deletes
Taught by
Presto Foundation