Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Data Lake - Design for Schema Evolution

EuroPython Conference via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for managing schema evolution in data lakes through this informative EuroPython 2021 conference talk. Learn best practices for storage, control, scalability, and availability in data lake design. Discover how Episource tackled the complex task of storing and searching evolving nested JSON data from their NLP engine processing millions of medical documents. Gain insights into implementing a solution using AVRO format for schema evolution, leveraging a Schema registry for version control, and utilizing Athena for distributed SQL queries. Understand the benefits of both "schema-on-write" and "schema-on-read" approaches in maintaining data integrity and compatibility across schema changes.

Syllabus

Prakshi Yadav - Data lake: Design for schema evolution

Taught by

EuroPython Conference

Reviews

Start your review of Data Lake - Design for Schema Evolution

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.