Transform Your Machine Learning Pipelines with Apache Hudi

Overview

Discover how to revolutionize machine learning pipelines integrated with data lakes in this 25-minute conference talk by Nadine Farah from Onehouse. Learn about the challenges of maintaining fresh, accurate, and near real-time data for ML models in traditional data lakes. Explore how Apache Hudi addresses these issues with features like upserts, incremental processing, and near real-time access. Gain insights into building efficient ML pipelines using Hudi's capabilities, including time-travel querying and incremental data pulls. Understand how to overcome data latency, implement incremental updates, and ensure timely data availability for ML models. By the end of this talk, acquire knowledge on transforming your ML pipelines to harness the full potential of data lakes using Apache Hudi.

Syllabus

Unveil the Magic Without Hoodini: Transform Your Machine Learning Pipelines with Apa... Nadine Farah

Taught by

Linux Foundation

Reviews

Start your review of Transform Your Machine Learning Pipelines with Apache Hudi

Taught by

Tags

Machine Learning with Apache Spark

Data Alchemy: Transforming Raw Data to Gold with Apache Hudi and DBT

Scalable Machine Learning on Big Data using Apache Spark

Data Engineering, Big Data, and Machine Learning on GCP

Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power New Insights at Scale

How to Speed Up Your Lakehouse Queries by an Order of Magnitude with Multi-modal Index Subsystem Using Apache Hudi and Presto

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.