Do Androids Know They're Only Dreaming of Electric Sheep?

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the intricacies of hallucination detection in transformer language models through this 59-minute lecture presented by Sky Wang from Columbia University at the USC Information Sciences Institute. Delve into the design of probes trained on internal representations to predict hallucinatory behavior in in-context generation tasks. Examine the creation of a span-annotated dataset for organic and synthetic hallucinations across various tasks. Discover the ecological validity challenges of probes trained on synthetic hallucinations for organic hallucination detection. Analyze how hidden state information about hallucination varies across tasks and distributions. Investigate the differences in intrinsic and extrinsic hallucination saliency across layers, hidden state types, and tasks. Learn about the potential of probing as an efficient alternative to language model hallucination evaluation when model states are available. Gain insights from Sky Wang, a Ph.D. candidate in Computer Science at Columbia University, whose research focuses on Natural Language Processing, Computational Social Science, and mechanistic interpretability.