Overview
Explore the mathematical foundations of attention layers in transformers through this Google TechTalk presented by Clayton Sanford. Delve into both positive and negative results regarding the representation power of attention layers, focusing on intrinsic complexity parameters such as width, depth, and embedding dimension. Discover how transformers outperform recurrent and feedforward networks in a sparse averaging task, scaling logarithmically rather than polynomially with input size. Examine the limitations of attention layers in a triple detection task, where complexity scales linearly with input size. Learn about the application of communication complexity in transformer analysis and gain insights into the representational properties and inductive biases of neural networks. Presented by Clayton Sanford, a PhD student at Columbia studying machine learning theory, this talk also touches on his work in solving learning combinatorial algorithms with transformers and climate modeling using machine learning.
Syllabus
Representational Strengths and Limitations of Transformers
Taught by
Google TechTalks