The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the concept of feature stores in data architecture and their role in productionizing ML applications through this 25-minute conference talk. Learn about the challenges of managing data and deploying applications in experimental, data-driven research environments, particularly in production ML pipelines with interdependent modeling and featurization stages. Discover how to implement a feature store as an orchestration engine for ML pipeline stages using Spark and MLflow, going beyond the traditional role of a metadata repository. Gain insights into breaking down ML pipeline deployment, avoiding the 'clone and own' anti-pattern, and isolating pipeline orchestration concerns. Explore novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete system designs using open source tools. Understand the state of feature stores in industry through a survey of reference architectures, open source repositories, and client experiences. Walk away with practical system designs and innovative algorithms to inspire your own feature store implementation.

Syllabus

Introduction
Common Problem
Whats the effort
Semantics
Machine Learning Example
Customer Segmentation Example
Trade Test Split Example
Feature Management
Automation
ML Pipeline
Pipeline Overview
Why does it exist
Pipeline deployment
Pipeline stage declaration
Pipeline construction
Vectorizing text
Demo
ML pipeline orchestration API

Taught by

Databricks

Reviews

Start your review of The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production

Taught by

Scaling Data and ML with Apache Spark and Feast - Feature Engineering for Production

Enable Production ML with Databricks Feature Store

TFX- Production ML Pipelines with TensorFlow

Hopsworks - The Python-Centric Enterprise Feature Store

Building Real-Time ML Features with a Feature Platform

Accelerating the ML Lifecycle with Enterprise-Grade Feature Stores

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.