Detecting Semantic Shift in Large Corpora by Exploiting Temporal Random Indexing
Alan Turing Institute via YouTube
Overview
Explore the fascinating world of semantic shift detection in large corpora through Temporal Random Indexing (TRI) in this 30-minute talk from the Alan Turing Institute. Delve into the methodology that enables the analysis of word meaning evolution over time by leveraging extensive datasets. Discover how TRI constructs WordSpaces incorporating temporal information to trace semantic changes. Examine experiments conducted on the Italian language and gain insights into preliminary findings from the UK internet archive corpus analysis. Learn about the intersection of natural language processing, machine learning, and information retrieval as presented by Assistant Professor Pierpaolo Basile from the University of Bari Aldo Moro. Understand the motivation behind detecting meaning shifts, the differences between synchronic and diachronic linguistics, and the application of distributional semantics. Explore the process of building time series, change point detection, and the challenges of working with social media data. Gain knowledge on constructing a gold standard for evaluation using historical dictionaries and the intricacies of the UK Web Archive dataset.
Syllabus
Intro
Motivation Detect meaning shift
Synchronic vs.
Diachronic Linguistics Why?
Distributional
Temporal Random Indexing
Methodology
Time Series
Change point detection
Social media
Data Format
TRI The UK Web Archive
Build a gold standard for the evaluation Historical dictionary
Taught by
Alan Turing Institute