Low Latency Change Data Capture to Data Lake Using Apache Flink and Apache Paimon

Overview

Learn how to implement low latency Change Data Capture (CDC) pipelines in this technical conference talk that explores using Apache Paimon and Apache Flink for efficient data lake synchronization. Discover various CDC mechanisms and their trade-offs, while understanding how to overcome challenges like record-level upserts, deletes, data compaction, and schema evolution. Explore the partial-update merge engine and changelog tracking capabilities for streaming data, and gain valuable insights into the comparative advantages of Apache Paimon versus Apache Hudi and Apache Iceberg. Master the techniques for building efficient CDC pipelines that minimize latency with Merge-on-Read operations, and receive practical guidance on choosing the right solution for specific use cases in analytics workloads.

Syllabus

Low latency Change Data Capture (CDC) to your data lake, using Apache Flink and Apache Paimon

Taught by

OSACon

Reviews

Start your review of Low Latency Change Data Capture to Data Lake Using Apache Flink and Apache Paimon

Taught by

Efficient, Low Latency Ingestion to Large Tables via Apache Flink and Apache Iceberg

Apache Paimon - A Unified Data Lake for Stream, Batch, and OLAP Processing

CDC Stream Processing with Apache Flink

Real-Time Data Integration Practice Based on Flink CDC at Alibaba Cloud

Empowering Real-Time Data Integration with Flink CDC

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

Never Stop Learning.