Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

Databricks via YouTube

Overview

Explore how Zillow's data engineering team revolutionized their data pipeline architecture using Apache Spark in this 27-minute conference talk. Learn about the challenges of balancing development speed with pipeline maintainability in a rapidly evolving organization. Discover how Zillow identified and addressed technical debt, improved data quality enforcement, consolidated shared pipeline functionality, and implemented scalable complex business logic. Gain insights into the process of designing a new end-to-end pipeline architecture that enhances robustness, maintainability, and scalability while reducing code complexity. Understand the pain points in pipeline development, maintenance, and scaling, and explore the pros and cons of various ETL patterns. Delve into Zillow's approach to creating more scalable and robust data pipelines using Apache Spark, including the establishment of processing layers, the development of a Pipeler Library, config-driven orchestration, separation of data processing and business logic, and early data validation techniques.

Syllabus

Intro
What is Zillow Offers?
Original Architecture
New Architecture
Establish Processing Layers
Pipeler Library
Config-driven Orchestration
Data Processing vs. Business Logic
Validating Data Early

Taught by

Databricks

Reviews

Start your review of Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.