BLEURT - Learning Robust Metrics for Text Generation

Overview

Explore a comprehensive video explanation of the BLEURT paper, which proposes a learned evaluation metric for text generation models. Dive into the challenges of evaluating machine translation systems and learn how BLEURT addresses these issues through a novel pre-training scheme using synthetic data. Discover the key components of the approach, including fine-tuning BERT, generating synthetic data, and priming via auxiliary tasks. Examine the experimental results, distribution shifts, and potential concerns associated with this innovative metric. Gain insights into the state-of-the-art performance of BLEURT on recent WMT Metrics shared tasks and the WebNLG Competition dataset.

Syllabus

- Intro & High-Level Overview
- The Problem with Evaluating Machine Translation
- Task Evaluation as a Learning Problem
- Naive Fine-Tuning BERT
- Pre-Training on Synthetic Data
- Generating the Synthetic Data
- Priming via Auxiliary Tasks
- Experiments & Distribution Shifts
- Concerns & Conclusion