Nugget: Neural Agglomerative Embeddings of Text
Center for Language & Speech Processing(CLSP), JHU via YouTube
Overview
Explore a novel approach to text embedding called Nugget in this 37-minute conference talk by Guanghui Qin from the Center for Language & Speech Processing at Johns Hopkins University. Learn how Nugget addresses the limitations of constant-size representations by dynamically encoding language into meaningful units based on a subset of input tokens. Discover how this method outperforms existing approaches in semantic comparison tasks and offers potential for expanding the contextual window of language models. Gain insights into the training process of Nugget through tasks like autoencoding and machine translation, and understand its implications for future language models that can process significantly larger amounts of content.
Syllabus
Nugget: Neural Agglomerative Embeddings of Text - Guanghui Qin
Taught by
Center for Language & Speech Processing(CLSP), JHU