Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Declarative ETL Pipelines with Delta Live Tables - Modern Software Engineering for Data Analysts and Engineers
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 What is a Streaming Live Table? Based on Spark™ Structured Streaming
- 3 Development vs Production Fast iteration or enterprise grade reliability
- 4 Choosing pipeline boundaries Break up pipelines at natural external divisions.
- 5 Pitfall: hard-code sources & destinations Problem: Hard coding the source & destination makes it impossible to test changes outside of production, breaking CI/CD
- 6 Ensure correctness with Expectations Expectations are tests that ensure data quality in production
- 7 Expectations using the power of SQL Use SQL aggregates and joins to perform complex validations
- 8 Using Python Write advanced DataFrame code and UDFs
- 9 Installing libraries with pip pip is a package installer for python
- 10 Best Practice: Integrate using the event log Use the information in the event log with your existing operational tools.
- 11 DLT Automates Failure Recovery Transient issues are handled by built-in retry logic
- 12 Modularize your code with configuration Avoid hard coding paths, topic names, and other constants in your code.
- 13 Workflow Orchestration For Triggered DLT Pipelines
- 14 Use Delta for infinite retention Delta provides cheap, elastic and governable storage for transient sources