Completed
- My Criticism of EMD
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
DeBERTa - Decoding-Enhanced BERT with Disentangled Attention
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro & Overview
- 2 - Position Encodings in Transformer's Attention Mechanism
- 3 - Disentangling Content & Position Information in Attention
- 4 - Disentangled Query & Key construction in the Attention Formula
- 5 - Efficient Relative Position Encodings
- 6 - Enhanced Mask Decoder using Absolute Position Encodings
- 7 - My Criticism of EMD
- 8 - Experimental Results
- 9 - Scaling up to 1.5 Billion Parameters
- 10 - Conclusion & Comments