This course empowers the students to be more efficient, effective, and productive in modern, real-world ML projects by adopting best practices around reproducible workflows. In particular, it teaches the fundamentals of MLops and how to: a) create a clean, organized, reproducible, end-to-end machine learning pipeline from scratch using MLflow b) clean and validate the data using pytest c) track experiments, code, and results using GitHub and Weights & Biases d) select the best-performing model for production and e) deploy a model using MLflow. Along the way, it also touches on other technologies like Kubernetes, Kubeflow, and Great Expectations and how they relate to the content of the class.
Overview
Syllabus
- Introduction to Reproducible Model Workflows
- Dive into reproducible model workflows and machine learning operations, learning about use cases, its history, and what you'll build at the end of the course.
- Machine Learning Pipelines
- Build out machine learning pipelines, as well as learning how to version data and model artifacts.
- Data Exploration and Preparation
- Come up with re-usable processes for performing exploratory data analysis (EDA), cleaning and pre-processing data, and segregating/splitting data.
- Data Validation
- Validate data through deterministic and non-deterministic testing, and look at handling different parameters with PyTest.
- Training, Validation and Experiment Tracking
- Write an inference pipeline, validate and choose your best performing models from experiments, and test your final model artifacts.
- Final Pipeline, Release and Deploy
- Write a full end-to-end pipeline, release the pipeline, and deploy with MLflow.
- Build an ML Pipeline for Short-term Rental Prices in NYC
- Create a re-usable end-to-end pipeline for predicting short-term rental prices in New York City!
Taught by
nd0821 Giacomo Vianello