Splitting the Difference on Adversarial Training - A Method for Robust Neural Networks

Overview

Watch a 13-minute conference talk from USENIX Security '24 exploring a novel approach to adversarial training in deep neural networks. Learn how researchers from Ben-Gurion University tackle the challenge of adversarial examples by splitting each class into "clean" and "adversarial" categories, effectively doubling the number of classes but simplifying decision boundaries. Discover how this innovative method achieves robust models while maintaining optimal or near-optimal natural accuracy, demonstrated through experiments on CIFAR-10 dataset achieving 95.01% accuracy. Understand the theoretical framework behind this approach and its practical implications for real-world applications where natural accuracy is crucial. Explore how this general method provides significant robustness to classifiers while minimizing degradation of their natural accuracy, offering a promising solution to one of deep learning's fundamental weaknesses.