Overview
Syllabus
Intro
If data is fuel, then we need to measure its value
Data value in the context of ML
Ingredients of Data Value in ML
Leave One Out Method
Desirable properties
Data Shapley Value
Applications of Data Shapley
UK Biobank Lung Cancer prediction
Removing low value data improves prediction
Adding high value data improves prediction
Negative Shapley identifies mislabeled data
Domain adaptation: face recognition
Dermatology classification
Clinical notes NLP
Efficiently approximating data Shapley
New frontiers of data valuation
Discussion
Taught by
Simons Institute