Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Longformer - The Long-Document Transformer

Yannic Kilcher via YouTube

Overview

Explore a comprehensive video analysis of the Longformer, an innovative extension of the Transformer model designed to process long documents. Delve into the key concepts of sliding window attention and sparse global attention, which enable the handling of sequences with thousands of tokens. Examine how this architecture overcomes the quadratic scaling limitation of traditional self-attention mechanisms. Learn about the model's performance in character-level language modeling tasks and its state-of-the-art results on text8 and enwik8 datasets. Discover the Longformer's effectiveness when pretrained and fine-tuned on various downstream tasks, consistently outperforming RoBERTa on long document tasks. Gain insights into the model's architecture, including the introduction of local windowed attention combined with task-motivated global attention. Understand the significance of this advancement in natural language processing and its potential applications in handling extensive documents.

Syllabus

Introduction
Problem
Transformer Model
Keys Queries
Convolutional Network
Dilated Window
Global Attention

Taught by

Yannic Kilcher

Reviews

Start your review of Longformer - The Long-Document Transformer

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.