Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how data.ai's machine learning team leverages the Databricks Platform to implement MLOps best practices for high-frequency retraining in this 32-minute conference talk. Learn about the framework created to incorporate MLOps into weekly retraining for approximately 50,000 sklearn models in parallel. Explore how Pandas UDFs can be used to apply arbitrary code in groups, enabling MLflow logging and model registration at scale for any grouped data. Gain insights into the challenges of parallelizing model training across multiple categories and countries, and understand the limitations of this approach. Consider how this methodology could be adapted for more time-sensitive use cases. Presented by Kaleb Lowe, Staff Machine Learning Engineer at Data.AI, this talk offers valuable insights for data scientists and machine learning engineers working on large-scale model retraining projects.
Syllabus
Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs.
Taught by
Databricks