Tuning Machine Learning Models - Scaling, Workflows, and Architecture

Overview

Explore the intricacies of tuning machine learning models in this 24-minute conference talk from Databricks. Delve into the automation of hyperparameter tuning, scaling techniques using Apache Spark, and best practices for optimizing workflows and architecture. Learn how to leverage Hyperopt, a popular open-source tool for ML tuning in Python, and discover its Spark-powered backend for enhanced scalability. Gain insights into effective tuning workflows, including how to select parameters, track progress, and iterate using MLflow. Examine architectural patterns for both single-machine and distributed ML workflows, and understand how to optimize data ingestion with Spark. Discover the potential of joblib-spark for distributing scikit-learn tuning jobs across Spark clusters. While generally accessible, this talk is particularly valuable for those with knowledge of machine learning and Spark.

Syllabus

Introduction
What are hyper parameters
Tuning ML models
Hyperparameters
Single Machine Training
Distributed Training
Training One Model Per Group
Workflows
Models vs Pipelines
Resources