Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs

Overview

Explore a 16-minute conference presentation from SIGIR 2024 that examines how user feedback impacts both crowdworkers and Large Language Models (LLMs) in dialogue system evaluation. Delve into research findings presented by Clemencia Siro, Mohammad Aliannejadi, and Maarten de Rijke as they challenge traditional evaluation methods for conversational AI systems. Learn about the complex interplay between human evaluators and automated systems, and understand how incorporating user feedback can reshape our approach to assessing dialogue system performance. Gain insights into the evolving landscape of AI evaluation methodologies through this ACM-sponsored talk that bridges the gap between human and machine-based assessment techniques.