Representational Strengths and Limitations of Transformers

Overview

Explore the mathematical foundations of attention layers in transformers through this Google TechTalk presented by Clayton Sanford. Delve into both positive and negative results regarding the representation power of attention layers, focusing on intrinsic complexity parameters such as width, depth, and embedding dimension. Discover how transformers outperform recurrent and feedforward networks in a sparse averaging task, scaling logarithmically rather than polynomially with input size. Examine the limitations of attention layers in a triple detection task, where complexity scales linearly with input size. Learn about the application of communication complexity in transformer analysis and gain insights into the representational properties and inductive biases of neural networks. Presented by Clayton Sanford, a PhD student at Columbia studying machine learning theory, this talk also touches on his work in solving learning combinatorial algorithms with transformers and climate modeling using machine learning.

Syllabus

Representational Strengths and Limitations of Transformers

Taught by

Google TechTalks

Reviews

Start your review of Representational Strengths and Limitations of Transformers

Taught by

Deep Learning Masterclass with TensorFlow 2 Over 20 Projects

Mathematics Behind Large Language Models and Transformers

Transformers, Parallel Computation, and Logarithmic Depth

Transformers and the End of Inductive Bias in Neural Networks

Transformers Are RNNs- Fast Autoregressive Transformers With Linear Attention

Feedback Transformers - Addressing Some Limitations of Transformers with Feedback Memory

10 Best Deep Learning Courses for 2024

Never Stop Learning.