How to Fail Interpretability Research

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the pitfalls and misconceptions in interpretability research with Google Brain's Been Kim in this 45-minute talk from the Simons Institute's "Emerging Challenges in Deep Learning" series. Delve into common mistakes and misunderstandings in the field, including the pursuit of universal definitions, performance trade-offs, and evaluation methods. Examine the responsibilities of researchers in presenting explanations and the real-world implications of their work. Gain insights on how to approach interpretability research more effectively and ethically, avoiding common traps that can hinder progress in this crucial area of machine learning.

Syllabus

Intro
Premeditation of evils
Interpretability hype.
well... we've been here before...
What this talk is about
What this talk is NOT about
We need interpretability to increase user trust.
We need to understand every single bit of the model.
Agenda Many opportunities to fail.
We first must define a universal mathematical definition of interpretability
The performance and interpretability trade-off is inevitable.
How I present the explanations doesn't
Since there is no good way to evaluate interpretability methods, I can only show you qualitative results
I am a computer scientist! Running human
The explanation is always true; It is what the model thinks.
I'm just a researcher who provide technical tools. The real world usage is something I cannot control.