Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

Center for Language & Speech Processing(CLSP), JHU via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Watch a 14-minute award-winning conference presentation from Johns Hopkins University's Center for Language & Speech Processing that explores the complexities of knowledge cutoff dates in Large Language Models (LLMs). Dive into the critical distinction between reported and effective cutoff dates for training data, and understand why this matters for applications requiring current information. Learn about a novel approach to estimate effective cutoffs at the resource level by probing across different data versions, without needing access to pre-training data. Discover key findings that reveal significant discrepancies between reported and effective cutoffs, attributed to temporal misalignments in CommonCrawl data and complications in LLM deduplication schemes. Gain valuable insights into why cutoff dates are more nuanced than previously thought, and understand the implications for both LLM dataset curators and practitioners implementing these models.

Syllabus

Dated Data: Tracing Knowledge Cutoffs in Large Language Models (COLM 2024 Outstanding Paper Award)

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of Dated Data: Tracing Knowledge Cutoffs in Large Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.