Building Reproducible ML Processes with an Open Source Stack

Overview

Learn how to create truly reproducible machine learning experiments in this 31-minute conference talk from the Toronto Machine Learning Series. Explore the essential components of reproducible ML processes, including MLFlow Projects for code reproducibility, lakeFS for data versioning, and Infrastructure-as-code for environment consistency. Follow along with a practical code demonstration that showcases how to recreate experiments using identical input data, code, and processing environments from previous runs. Master techniques for creating data snapshots through commits, implementing effective tagging systems, and managing the synchronized history of both code and data components using an open-source technology stack.