Overview
Explore a research presentation from USENIX Security '24 that investigates critical security vulnerabilities in customized Large Language Models (LLMs). Learn about the first instruction backdoor attacks targeting applications integrated with untrusted customized LLMs, particularly focusing on GPTs. Discover how researchers developed three levels of attacks - word-level, syntax-level, and semantic-level - that can embed backdoors through prompt design without modifying the underlying LLM architecture. Examine experimental results across 6 major LLMs and 5 benchmark text classification datasets, demonstrating successful attack implementation while maintaining model utility. Understand proposed defense strategies against these vulnerabilities and grasp the broader implications for LLM customization security.
Syllabus
USENIX Security '24 - Instruction Backdoor Attacks Against Customized LLMs
Taught by
USENIX