Overview
Learn about data lake management and version control in this 24-minute conference talk from Presto Foundation. Explore how to apply Git-like operations to files in object storage, making data lake management more efficient and reliable. Discover solutions for common challenges in data-intensive applications, including data quality assurance, experimentation capabilities, and recovery from corrupted data scenarios. Gain insights into chaos engineering principles for distributed data systems and understand how modern tools can enhance data workload resilience. Examine the evolution from basic data processing challenges to current manageability issues, and see how technologies like Kafka, Spark, Presto, and Snowflake have transformed big data operations. Master techniques for faster development of data-intensive applications while maintaining high data quality through practical demonstrations of open-source tooling.
Syllabus
A Git-like Repository for your Data Lake - Vinodhini Sivakami Duraisamy, Treeverse
Taught by
Presto Foundation