Overview
Explore the Daft distributed Python data engine for multimodal data curation at any scale in this 27-minute talk by Jay Chia. Discover how Daft addresses the fundamental needs of ML/AI data platforms, including terabyte-scale ETL with complex model batch inference, analytics for multimodal datatypes using SQL, and performant dataloading for model training and inference. Learn why other tools fall short in meeting these requirements and see a full example of building a highly performant data platform using the Daft Dataframe and open file formats like JSON and Parquet. Gain insights from Jay's experience in ML Infrastructure across biotech and autonomous driving industries, and understand how Daft can revolutionize your approach to data curation for ML/AI projects in 2024 and beyond.
Syllabus
The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI
Taught by
MLOps.community