Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Data Addition Dilemma: Navigating Distribution Shifts in Machine Learning

Simons Institute via YouTube

Overview

Watch a 37-minute lecture from UC Berkeley researcher Irene Y Chen at the Simons Institute exploring why combining data from different sources for machine learning training isn't always beneficial. Learn about the "Data Addition Dilemma" where mixing dissimilar data sources can reduce accuracy, create fairness issues, and harm performance for underrepresented groups. Examine the fundamental trade-off between benefits of increased data scale and drawbacks of distribution shifts when combining datasets. Discover practical strategies and heuristics for deciding which data sources to combine to achieve optimal model performance improvements. Gain insights into key considerations for data collection and composition as AI models continue growing in size and complexity.

Syllabus

The Data Addition Dilemma

Taught by

Simons Institute

Reviews

Start your review of The Data Addition Dilemma: Navigating Distribution Shifts in Machine Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.