LongNet: Understanding Transformer Scaling to 1 Billion Tokens - A Technical Overview

LongNet: Understanding Transformer Scaling to 1 Billion Tokens - A Technical Overview

AI Bites via YouTube Direct link

- Multi-head Dilated Attention

6 of 8

6 of 8

- Multi-head Dilated Attention

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

LongNet: Understanding Transformer Scaling to 1 Billion Tokens - A Technical Overview

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro
  2. 2 - Computational Complexity in LLM models
  3. 3 - Sparse Attention Paper
  4. 4 - Self Attention overview
  5. 5 - Dilated Attention
  6. 6 - Multi-head Dilated Attention
  7. 7 - Distributed Training
  8. 8 - Evaluation of LongNet Dilated Attention

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.