How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks via YouTube Direct link

Intro

1 of 23

1 of 23

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 databricks
  3. 3 Deep Dive into the New Features of Apache Spark 3.0
  4. 4 A Delta Lake 0.7.0 + Spark 3.0 AMA
  5. 5 Spark Catalyst Optimizer
  6. 6 Adaptive Query Execution AQE
  7. 7 Apache SparkTM 3.0 AQE Fundamentals
  8. 8 Starting with Broadcast Hash Joins
  9. 9 Dynamically Switching Join Strategies Apache Spark 3.0 NE Fundamentals
  10. 10 Dynamically Coalescing Shuffle Partitions Apache Spark 3.0 ADÉ Fundamentals
  11. 11 Dynamically Optimize Skew Joins
  12. 12 TPC-DS performance gains from AQE
  13. 13 Dynamic Partition Pruning: Before Optimiza
  14. 14 How to Use Join Hints? Broadcast Hash Join
  15. 15 Extensibility and Ecosystem
  16. 16 Data Source V2
  17. 17 But what happens with DML under the cover What really happens to the file system when you run delete update and merge?
  18. 18 Time Travel The transaction log and additive files - data versioning
  19. 19 Control Table History Retention
  20. 20 Enable DataSourceV2 and Catalog API Integration
  21. 21 Data Quality Framework Improved SOL DOL and DMLS and ACID Transactions are just the start
  22. 22 Lakehouse Paradigm Improved Performance. DW-like capabilities, on low cost cloud object stores
  23. 23 Try out Spark 3.0 + Delta Lake now!

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.