Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Welcome
- 3 Introductions
- 4 Agenda
- 5 Data Quality Cone of Anxiety
- 6 How do we address bad data
- 7 What is data observability
- 8 Freshness
- 9 Distribution
- 10 Volume
- 11 Schema
- 12 Data Lineage
- 13 Data Reliability Lifecycle
- 14 Lake vs Warehouse
- 15 Metadata
- 16 Storage
- 17 Query logs
- 18 Query engine
- 19 Questions
- 20 Describe Detail
- 21 Architecture for observability
- 22 Measuring update times
- 23 Loading data in CSV or JSON
- 24 Update cadence
- 25 Feature engineering
- 26 Lambda function
- 27 Delay between updates
- 28 Model Parameters
- 29 Training Labels
- 30 Questions and Answers
- 31 Summary
- 32 Upcoming events
- 33 Data Quality Fundamentals
- 34 Monte Carlo