Fastformer - Additive Attention Can Be All You Need

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore a detailed analysis of the Fastformer, a proposed efficient Transformer model for text understanding, in this 36-minute video. Dive into the architecture's key components, including additive attention and element-wise multiplication, and understand how it aims to achieve linear complexity for processing long sequences. Compare Fastformer to classic attention mechanisms, examine potential issues with the architecture, and evaluate its effectiveness through experimental results. Gain insights into the ongoing research efforts to improve Transformer models for handling long contexts efficiently.

Syllabus

- Intro & Outline
- Fastformer description
- Baseline: Classic Attention
- Fastformer architecture
- Additive Attention
- Query-Key element-wise multiplication
- Redundant modules in Fastformer
- Problems with the architecture
- Is this even attention?
- Experimental Results
- Conclusion & Comments

Taught by

Yannic Kilcher

Reviews

Start your review of Fastformer - Additive Attention Can Be All You Need

Taught by

Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention

Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained

DeBERTa - Decoding-Enhanced BERT with Disentangled Attention

Not All Memories Are Created Equal - Learning to Forget by Expiring

Train Short, Test Long - Attention With Linear Biases Enables Input Length Extrapolation

XCiT- Cross-Covariance Image Transformers - Facebook AI Machine Learning Research Paper Explained

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

10 Best Data Science Courses

Never Stop Learning.