Train Short, Test Long - Attention With Linear Biases Enables Input Length Extrapolation

Overview

Coursera Plus Flash Sale: All Certificates & Courses 40% Off. 72 Hours Only!

Grab it

Explore the innovative ALiBi (Attention with Linear Biases) method for improving sequence extrapolation in transformer models. Dive into the limitations of traditional position encodings and discover how ALiBi's simple yet effective approach allows for efficient extrapolation to longer sequences than seen during training. Learn about the implementation details, including how to choose the slope parameter, and examine experimental results demonstrating ALiBi's performance advantages. Gain insights into why this method leads to better outcomes and understand its potential impact on natural language processing tasks.

Syllabus

- Intro & Overview
- Position Encodings in Transformers
- Sinusoidial Position Encodings
- ALiBi Position Encodings
- How to choose the slope parameter
- Experimental Results
- Comments & Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of Train Short, Test Long - Attention With Linear Biases Enables Input Length Extrapolation

Taught by

Attention with Linear Biases Explained

DeBERTa - Decoding-Enhanced BERT with Disentangled Attention

Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention

Not All Memories Are Created Equal - Learning to Forget by Expiring

Perceiver - General Perception with Iterative Attention

Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained

10 Best Deep Learning Courses for 2024

Never Stop Learning.