Inside Llama 3 Red Team Process - AI Safety and Security Assessment

Overview

Explore Meta's core AI Red Team's comprehensive journey through the red teaming process of the Llama 3 Large Language Model in this 32-minute conference presentation. Gain deep insights into advanced model red teaming and safety methodologies, starting with fundamental concepts before diving into Meta's specific approaches and processes. Learn about discovering new risks within complex AI capabilities, understanding how emergent capabilities can lead to emergent risks, and examining various attack types across different model capabilities. Discover how decades of security expertise translate into AI trust and safety, while understanding which traditional security principles apply to this new frontier. Follow along as the team explains their innovative use of automation for scaling attacks, their novel approach to multi-turn adversarial AI agents, and the systems developed for benchmarking safety across high-risk areas. Examine advanced cyber-attacks (both human and automated), learn about Meta's open benchmark CyberSecEvals, and understand the national security implications of state-of-the-art models. Conclude with a discussion of assessment and measurement challenges, current industry gaps in AI Red Teaming, and the rapidly evolving landscape of AI Safety.