Explore strategies used by model builders to create large datasets and discover two attacks that exploit these mechanics in this 33-minute Black Hat conference talk. Learn about the vulnerabilities of deep learning models that rely on massive, distributed datasets gathered from the internet, including issues related to expired domains and potential exploitation by malicious actors. Understand how this problem affects not only StableDiffusion but also Large-Language Models like ChatGPT trained on internet-sourced data. Gain insights into the practical implications of poisoning web-scale training datasets and its impact on popular AI models.
Overview
Syllabus
Poisoning Web-Scale Training Datasets is Practical
Taught by
Black Hat