Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Retooling AI Training Sets for Improved Model Performance

Snorkel AI via YouTube

Overview

Explore the importance of training datasets in AI breakthroughs through this 26-minute talk by Ludwig Schmidt, Assistant Professor of Computer Science at the University of Washington. Learn about DataComp, a benchmark designed to shift focus from model architectures to dataset innovation. Discover how researchers can propose new training sets using a fixed 12.8B image-text pair pool from Common Crawl. Understand the evaluation process using standardized CLIP training code and 38 downstream test sets. Examine the multiple scales of the DataComp benchmark, which facilitate scaling trend studies and accommodate researchers with varying resources. Gain insights into the promising results of baseline experiments, including the introduction of DataComp-1B dataset, which outperforms OpenAI's CLIP model on ImageNet while using the same compute budget. Compare the data improvement to LAION-5B, showcasing a 9x improvement in compute cost. Delve into the potential of the DataComp workflow for advancing multimodal datasets and enhancing AI training methodologies.

Syllabus

Why You Should Retool Your AI Training Set (Not Your Model)

Taught by

Snorkel AI

Reviews

Start your review of Retooling AI Training Sets for Improved Model Performance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.