Retentive Network - A Successor to Transformer for Large Language Models

Retentive Network - A Successor to Transformer for Large Language Models

Yannic Kilcher via YouTube Direct link

- Chunkwise and multi-scale retention

5 of 7

5 of 7

- Chunkwise and multi-scale retention

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Retentive Network - A Successor to Transformer for Large Language Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro
  2. 2 - The impossible triangle
  3. 3 - Parallel vs sequential
  4. 4 - Retention mechanism
  5. 5 - Chunkwise and multi-scale retention
  6. 6 - Comparison to other architectures
  7. 7 - Experimental evaluation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.