DeBERTa - Decoding-Enhanced BERT with Disentangled Attention

DeBERTa - Decoding-Enhanced BERT with Disentangled Attention

Yannic Kilcher via YouTube Direct link

- Efficient Relative Position Encodings

5 of 10

5 of 10

- Efficient Relative Position Encodings

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

DeBERTa - Decoding-Enhanced BERT with Disentangled Attention

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Overview
  2. 2 - Position Encodings in Transformer's Attention Mechanism
  3. 3 - Disentangling Content & Position Information in Attention
  4. 4 - Disentangled Query & Key construction in the Attention Formula
  5. 5 - Efficient Relative Position Encodings
  6. 6 - Enhanced Mask Decoder using Absolute Position Encodings
  7. 7 - My Criticism of EMD
  8. 8 - Experimental Results
  9. 9 - Scaling up to 1.5 Billion Parameters
  10. 10 - Conclusion & Comments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.