Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Daft Distributed Python Data Engine: Multimodal Data Curation at Any Scale

MLOps.community via YouTube

Overview

Explore the Daft distributed Python data engine for multimodal data curation at any scale in this 27-minute talk by Jay Chia. Discover how Daft addresses the fundamental needs of ML/AI data platforms, including terabyte-scale ETL with complex model batch inference, analytics for multimodal datatypes using SQL, and performant dataloading for model training and inference. Learn why other tools fall short in meeting these requirements and see a full example of building a highly performant data platform using the Daft Dataframe and open file formats like JSON and Parquet. Gain insights from Jay's experience in ML Infrastructure across biotech and autonomous driving industries, and understand how Daft can revolutionize your approach to data curation for ML/AI projects in 2024 and beyond.

Syllabus

The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI

Taught by

MLOps.community

Reviews

Start your review of The Daft Distributed Python Data Engine: Multimodal Data Curation at Any Scale

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.