Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the design and implementation of a real-time data lake capable of handling dynamically changing schemas in this 25-minute presentation from Databricks. Learn how to build a robust streaming ETL pipeline that can adapt to changing schemas and new event types without downtime. Discover techniques for inferring schemas on the fly, tracking and storing schemas without a schema registry, and adjusting underlying tables automatically. Gain insights into deploying and managing hundreds of streams operationally on Databricks, and understand the cost and performance implications for growing ingestion loads from data providers. Dive into key topics such as schema variation hashing, batch processing, schema repository management, and essential takeaways for implementing this approach in production environments.