Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing and Attribution in AI

USC Information Sciences Institute via YouTube

Overview

Explore the Data Provenance Initiative, a groundbreaking effort to audit and trace over 1800 text datasets used in AI training. Learn about the legal and ethical concerns surrounding dataset licensing and attribution in the AI industry. Discover the tools and standards developed to trace dataset lineage, from sources and creators to license conditions and subsequent use. Examine the landscape analysis revealing stark differences between commercially open and closed datasets, including their composition and focus areas. Gain insights from speakers Anthony Chen, an engineer at Google DeepMind, and Shayne Longpre, a PhD candidate at MIT, as they present their findings and discuss the implications for data transparency and understanding in AI development. Delve into the challenges of dataset monopolization in areas such as low-resource languages, creative tasks, and synthetic training data.

Syllabus

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Taught by

USC Information Sciences Institute

Reviews

Start your review of The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing and Attribution in AI

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.