Overview
Explore the intricacies of data auditing and end-of-day processing in this 37-minute conference talk from NDC Conferences. Dive into Nielsen's robust Kafka architecture and ETL processes, uncovering the challenges of tracking data flow and preventing loss or duplication. Learn about the design process behind the Data Auditing system, Life Line, including AVRO Audit headers, auditing heartbeats, metadata design, and table optimization. Discover how to create an alert-based monitoring system and tackle the perennial question of determining the end of the day. Gain insights into technologies such as Kafka, Avro, Spark, Lambda functions, and complex SQL queries used in data auditing. Follow the journey from tracking and producing to analyzing and storing audit information, optimizing PostgreSQL for audit queries, and implementing alerts and add-ons to enhance data management processes.
Syllabus
Intro
Nielsen's Architecture (AT THE TIME)
Data Arrival Pain Points
Recovering from failures
Is it the end of day yet? Legacy answers to a legacy problem
Auditing Window
Auditing Header Injection
Designing Out Output Table
Optimizing PostgreSQL for Audit Queries
Alerts and add-ons
Taught by
NDC Conferences