Instruction Backdoor Attacks Against Customized Large Language Models

Overview

Explore a research presentation from USENIX Security '24 that investigates critical security vulnerabilities in customized Large Language Models (LLMs). Learn about the first instruction backdoor attacks targeting applications integrated with untrusted customized LLMs, particularly focusing on GPTs. Discover how researchers developed three levels of attacks - word-level, syntax-level, and semantic-level - that can embed backdoors through prompt design without modifying the underlying LLM architecture. Examine experimental results across 6 major LLMs and 5 benchmark text classification datasets, demonstrating successful attack implementation while maintaining model utility. Understand proposed defense strategies against these vulnerabilities and grasp the broader implications for LLM customization security.

Syllabus

USENIX Security '24 - Instruction Backdoor Attacks Against Customized LLMs

Taught by

USENIX

Reviews

Start your review of Instruction Backdoor Attacks Against Customized Large Language Models

Taught by

Building Production-Ready Apps with Large Language Models

Ethical Hacking against and with AI/LLM/ML (Lite Version!)

Large Language Models for Code Analysis - Evaluating LLM Performance and Limitations

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

The Security of Large Language Models

PICCOLO - Exposing Complex Backdoors in NLP Transformer Models

Never Stop Learning.