Overview
Explore Nextdoor's journey from hourly batch event ingestion to a near-real-time streaming solution using Delta Live Tables (DLT) in this 18-minute conference talk. Discover how this transformation enables internal users to query events promptly for analysis, monitoring, and real-time aggregations while reducing compute costs. Gain insights into leveraging file notification over directory listing, implementing effective monitoring techniques, and resolving conflicts between streaming and batch pipelines. Learn about using custom Spark metrics to determine optimal data consumption points and understand how to leverage schema evolution for evolving event schemas within DLT. Presented by Kavin Palanisamy, Software Engineer at Nextdoor's Data Platform team, this talk offers valuable lessons and practical knowledge for data engineers and analysts working with real-time data ingestion and processing.
Syllabus
Efficient Near Real-Time Event Ingestion using DLT: Insights and Lessons
Taught by
Databricks