Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Democratizing Data Quality Through a Centralized Platform at Zillow

Databricks via YouTube

Overview

Explore a comprehensive 54-minute conference talk from Databricks on building a centralized platform for data quality management. Learn how Zillow tackled the challenge of ensuring data quality across thousands of datasets and pipelines. Discover the five pillars of their data quality platform and its architecture. Gain insights into self-service onboarding processes, including data discovery, rule-based approaches, and monitoring. Understand how validation libraries and pipeline integration work to flag data quality issues early. Examine the platform's capabilities in defining and viewing data quality expectations, performing validations using Spark, and dynamically generating pipelines. See how data quality metrics are exposed alongside datasets to provide a comprehensive health picture over time. Conclude with future directions and key takeaways for implementing a robust data quality management system in complex data organizations.

Syllabus

Intro
About Zillow
Why Monitor Data Quality?
Challenges we Faced
5 Pillars for Data Quality Platform
Platform Architecture
Self-Service Onboarding - Goals
Self-Service Onboarding . Data Discovery
Self-Service Onboarding. Rule-based
Self-Service Onboarding Example
Self-Service Onboarding - Metrics
Self-Service Onboarding . Monitoring
Behind the Scenes
Validation Libraries
Pipeline Integration before
Pipeline Integration (after)
Validation Results
Future Direction
Key Takeaways

Taught by

Databricks

Reviews

Start your review of Democratizing Data Quality Through a Centralized Platform at Zillow

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.