Overview
Explore different streaming methods using Apache Spark and Kafka in this 34-minute conference talk by Itai Yaffe from Nielsen. Learn how Nielsen Marketing Cloud (NMC) transformed their data infrastructure to support real-time analytics for marketers and publishers. Discover the journey from CSV files and standalone Java applications to multiple Kafka and Spark clusters, handling a mixture of streaming and batch ETLs while supporting 10x data growth. Gain insights into early adoption experiences with Spark Streaming and Spark Structured Streaming, including overcoming technical challenges. Examine a unique solution using Kafka to simulate streaming over a Data Lake, reducing cloud service costs. Cover topics such as Kafka and Spark Streaming for stateless and stateful use cases, Spark Structured Streaming as an alternative, combining Spark Streaming with batch ETLs, and "streaming" over Data Lake using Kafka.
Syllabus
Intro
Problems
Whats Next
Local Aggregation
Weaknesses
Kafka
Summary
Recap
Big Data for Women
Questions
Taught by
Databricks