Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scalable Extraction of Training Data from Production Language Models

Simons Institute via YouTube

Overview

Watch a technical lecture from Google DeepMind researcher Nicholas Carlini exploring methods to extract training data from production language models. Learn about two novel attacks that successfully extract megabytes of training data from ChatGPT, despite its alignment training designed to prevent such extraction. Discover how the first attack exploits repetitive word patterns to cause model divergence and reveal training data, while the second attack leverages fine-tuning APIs to bypass safety measures. Gain insights into the implications of these findings for alignment strategies and privacy-preserving machine learning, with specific focus on how production models handle training data reproduction and the effectiveness of current safety mechanisms. Explore the broader context of language model security, alignment challenges, and the balance between model capabilities and data privacy.

Syllabus

Scalable Extraction of Training Data from (Production) Language Models

Taught by

Simons Institute

Reviews

Start your review of Scalable Extraction of Training Data from Production Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.