Data Versioning in Generative AI - A Pathway to Cost-effective Machine Learning
MLOps World: Machine Learning in Production via YouTube
Overview
Explore data versioning in the context of generative AI through this 35-minute conference talk by Dmitry Petrov, CEO of DVC. Gain insights into the unique challenges posed by the vast amounts of unstructured data used in generative AI workflows, including images, videos, audio, MRI scans, document scans, and text dialogues. Learn strategies for minimizing processing time and reducing API calls to external models like ChatGPT, resulting in significant cost savings. Discover effective methods for sharing datasets among ML researchers to enhance collaboration. Examine the pivotal transformations generative AI has introduced to data versioning, including annotations and embeddings versioning. Understand how data management differs in a generative AI environment compared to traditional ML, and learn efficient techniques for versioning annotations, embeddings, and auto-labels alongside data. By the end of this talk, acquire a comprehensive understanding of the rapidly evolving data management landscape in the era of generative AI and its impact on cost-effective machine learning practices.
Syllabus
Data Versioning in Generative AI: A Pathway to Cost-effective ML
Taught by
MLOps World: Machine Learning in Production