Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore how Zillow's data engineering team revolutionized their data pipeline architecture using Apache Spark in this 27-minute conference talk. Learn about the challenges of balancing development speed with pipeline maintainability in a rapidly evolving organization. Discover how Zillow identified and addressed technical debt, improved data quality enforcement, consolidated shared pipeline functionality, and implemented scalable complex business logic. Gain insights into the process of designing a new end-to-end pipeline architecture that enhances robustness, maintainability, and scalability while reducing code complexity. Understand the pain points in pipeline development, maintenance, and scaling, and explore the pros and cons of various ETL patterns. Delve into Zillow's approach to creating more scalable and robust data pipelines using Apache Spark, including the establishment of processing layers, the development of a Pipeler Library, config-driven orchestration, separation of data processing and business logic, and early data validation techniques.

Syllabus

Intro
What is Zillow Offers?
Original Architecture
New Architecture
Establish Processing Layers
Pipeler Library
Config-driven Orchestration
Data Processing vs. Business Logic
Validating Data Early

Taught by

Databricks

Reviews

Start your review of Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

100 Most Popular Courses for November

Most common

Popular subjects

Popular courses

Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

Overview

Syllabus

Taught by

Reviews

100 Most Popular Courses for November

Taught by

Empowering Developers with Self-Service ETL - Zillow's Approach

Best Practices for Building and Deploying Data Pipelines in Apache Spark

Developing Scalable Machine Learning Pipelines for Gaming Industry

Declarative ETL Pipelines with Delta Live Tables - Modern Software Engineering for Data Analysts and Engineers

Never Stop Learning.