Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LongNet: Understanding Transformer Scaling to 1 Billion Tokens - A Technical Overview

AI Bites via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Microsoft Research's latest transformer model architecture in this 12-minute technical video that breaks down how LongNet achieves the capability to process 1 billion input tokens. Learn about the computational complexities in Large Language Models, starting with an overview of sparse attention mechanisms before diving into the innovative dilated attention approach that overcomes traditional transformer sequence length limitations. Discover the implementation details of multi-head dilated attention, distributed training strategies, and performance evaluations that demonstrate LongNet's effectiveness. Through clear explanations and structured segments, gain insights into this ambitious advancement toward Artificial General Intelligence, complete with references to foundational concepts in self-attention and sparse attention architectures.

Syllabus

- Intro
- Computational Complexity in LLM models
- Sparse Attention Paper
- Self Attention overview
- Dilated Attention
- Multi-head Dilated Attention
- Distributed Training
- Evaluation of LongNet Dilated Attention

Taught by

AI Bites

Reviews

Start your review of LongNet: Understanding Transformer Scaling to 1 Billion Tokens - A Technical Overview

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.