Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Big Bird- Transformers for Longer Sequences

Yannic Kilcher via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive video explanation of the BigBird paper, which introduces a novel sparse attention mechanism for transformers to handle longer sequences. Learn about the challenges of quadratic memory requirements in full attention models and how BigBird addresses this issue through a combination of random, window, and global attention. Discover the theoretical foundations, including universal approximation and Turing completeness, as well as the practical implications for NLP tasks such as question answering and summarization. Gain insights into the experimental parameters, structured block computations, and results that demonstrate BigBird's improved performance on various NLP tasks and its potential applications in genomics.

Syllabus

- Intro & Overview
- Quadratic Memory in Full Attention
- Architecture Overview
- Random Attention
- Window Attention
- Global Attention
- Architecture Summary
- Theoretical Result
- Experimental Parameters
- Structured Block Computations
- Recap
- Experimental Results
- Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of Big Bird- Transformers for Longer Sequences

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.