Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of data pipelines in this insightful conference talk from YOW! 2019. Delve into the challenges of reproducibility, version control, and algorithm updates in data processing systems. Learn about "pure" data pipelines and how techniques from distributed build systems can enhance traceability, preserve previous results, and optimize workflow efficiency. Gain practical knowledge through concrete examples in various languages and distributed computation frameworks. Discover strategies to address common issues in data pipeline management, including result reproduction, code versioning, and handling algorithm changes. Ideal for data scientists, analytics professionals, and anyone involved in designing or maintaining data processing systems.