The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production
Databricks via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the concept of feature stores in data architecture and their role in productionizing ML applications through this 25-minute conference talk. Learn about the challenges of managing data and deploying applications in experimental, data-driven research environments, particularly in production ML pipelines with interdependent modeling and featurization stages. Discover how to implement a feature store as an orchestration engine for ML pipeline stages using Spark and MLflow, going beyond the traditional role of a metadata repository. Gain insights into breaking down ML pipeline deployment, avoiding the 'clone and own' anti-pattern, and isolating pipeline orchestration concerns. Explore novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete system designs using open source tools. Understand the state of feature stores in industry through a survey of reference architectures, open source repositories, and client experiences. Walk away with practical system designs and innovative algorithms to inspire your own feature store implementation.
Syllabus
Introduction
Common Problem
Whats the effort
Semantics
Machine Learning Example
Customer Segmentation Example
Trade Test Split Example
Feature Management
Automation
ML Pipeline
Pipeline Overview
Why does it exist
Pipeline deployment
Pipeline stage declaration
Pipeline construction
Vectorizing text
Demo
ML pipeline orchestration API
Taught by
Databricks