Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Unsupervised Machine Learning for Scaling Data Quality Monitoring in Databricks

Databricks via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how unsupervised machine learning can revolutionize data quality monitoring in Databricks in this 37-minute conference talk. Delve into the limitations of traditional rules and metrics approaches, and discover a set of fully unsupervised machine learning algorithms designed to monitor data quality at scale. Learn about the algorithms' functionality, strengths, and weaknesses, as well as their testing and calibration processes. Gain insights into unsupervised data quality monitoring techniques, their advantages and challenges, and practical steps to implement them in Databricks. Examine real-world examples using ticket sales data, and understand how to set up monitoring in Anomalo. Investigate various visualizations, including severity, explanation, distribution, and root cause analysis. Explore the process of encoding features automatically, building supervised models, and generating visualizations using SHAP values. Address challenges in implementation and testing, and learn how to get started with these techniques in Databricks.

Syllabus

Intro
Data Quality in the Modern Data Stack
Three Approaches to Data Quality Monitoring
Ticket Sales Data
Setup Monitoring in Anomalo
Anomalo Monitoring
Chaos Library
Check Log
Visualizations: Severity & Explanation
Visualizations Distribution
Visualizations: Root Cause Analysis
Encode Features Automatically
Build a Supervised Model
Generate Visualizations Using SHAP Values
Challenges
Testing
Get Started in Databricks
DATA+AI SUMMIT 2022

Taught by

Databricks

Reviews

Start your review of Unsupervised Machine Learning for Scaling Data Quality Monitoring in Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.