Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Databricks via YouTube Direct link

Update cadence

24 of 34

24 of 34

Update cadence

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Welcome
  3. 3 Introductions
  4. 4 Agenda
  5. 5 Data Quality Cone of Anxiety
  6. 6 How do we address bad data
  7. 7 What is data observability
  8. 8 Freshness
  9. 9 Distribution
  10. 10 Volume
  11. 11 Schema
  12. 12 Data Lineage
  13. 13 Data Reliability Lifecycle
  14. 14 Lake vs Warehouse
  15. 15 Metadata
  16. 16 Storage
  17. 17 Query logs
  18. 18 Query engine
  19. 19 Questions
  20. 20 Describe Detail
  21. 21 Architecture for observability
  22. 22 Measuring update times
  23. 23 Loading data in CSV or JSON
  24. 24 Update cadence
  25. 25 Feature engineering
  26. 26 Lambda function
  27. 27 Delay between updates
  28. 28 Model Parameters
  29. 29 Training Labels
  30. 30 Questions and Answers
  31. 31 Summary
  32. 32 Upcoming events
  33. 33 Data Quality Fundamentals
  34. 34 Monte Carlo

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.