Overview
Learn how to implement low latency Change Data Capture (CDC) pipelines in this technical conference talk that explores using Apache Paimon and Apache Flink for efficient data lake synchronization. Discover various CDC mechanisms and their trade-offs, while understanding how to overcome challenges like record-level upserts, deletes, data compaction, and schema evolution. Explore the partial-update merge engine and changelog tracking capabilities for streaming data, and gain valuable insights into the comparative advantages of Apache Paimon versus Apache Hudi and Apache Iceberg. Master the techniques for building efficient CDC pipelines that minimize latency with Merge-on-Read operations, and receive practical guidance on choosing the right solution for specific use cases in analytics workloads.
Syllabus
Low latency Change Data Capture (CDC) to your data lake, using Apache Flink and Apache Paimon
Taught by
OSACon