Getting ML Right in a Complex Data World

Overview

Explore the intricacies of machine learning workflows in a complex data environment through this informative conference talk. Delve into the iterative and repetitive nature of ML experimentation, focusing on data labeling, cleaning, preprocessing, and feature selection methods. Learn why quality ML at scale requires reproducibility of specific experiment iterations and the crucial role of data versioning. Discover how open-source tools enable efficient versioning of ML experiments without duplicating code, data, and models, potentially reducing storage costs. Through a live code demonstration, gain practical insights on creating a basic ML experimentation framework, reproducing ML components from specific iterations, and building intuitive, zero-maintenance experiment infrastructure using open-source tooling.