Overview
Explore a 19-minute IEEE conference talk on exposing complex backdoors in NLP transformer models. Learn about the challenges of detecting backdoors in natural language processing models compared to computer vision, and discover innovative solutions proposed by researchers from Purdue University and Rutgers University. Delve into the structure of NLP transformers, understand the difficulties posed by discrete input spaces and non-differentiable models, and examine proposed techniques such as differentiable model transformation and word-level inversion. Gain insights into the effectiveness of the PICCOLO approach for uncovering sophisticated backdoor attacks in transformer-based language models, and access the code repository for further exploration.
Syllabus
Intro
Backdoor Attacks on NLP Models
Trigger Inversion is highly effective at detecting backdoors in Computer Vision
Structure of NLP Transformers
Challenge I: Input Space is Discrete, and NLP Models are not Differentiable to Input
Proposal to Challenge I: Differentiable Model Transformation
Challenge II: Token Level Optimization cannot Reverse Engineer Complex Words with Multiple Tokens.
Proposal to Challenge II: Word Level Inversion
Overview
Evaluation Setup
Effectiveness
Code Repo
Taught by
IEEE Symposium on Security and Privacy