Distortion-Free Mechanisms for Language Model Provenance - Watermarking and Training Independence

Overview

Watch a research lecture exploring mechanisms for establishing provenance in language model artifacts, focusing on both text and model weights. Learn about innovative watermarking techniques for autoregressive language model outputs that remain robust even when a constant fraction of text is edited, developed through collaboration with John Thickstun, Tatsu Hashimoto, and Percy Liang. Discover methods for testing the independence of language model training processes by examining model weights, presented from research conducted with Sally Zhu, Ahmed Ahmed, and Percy Liang. Gain insights into the latest developments in language model security, alignment, and copyright protection through this technical presentation from Stanford University researcher Rohith Kuditipudi at the Simons Institute.