Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
USC Information Sciences Institute via YouTube
Overview
Watch a research seminar exploring bias in content moderation systems and their impact on LGBTQ+ online expression. Examine findings from a study investigating how language models handle reclaimed slurs in gender-queer dialect, using the novel QueerReclaimLex dataset with 109 curated templates. Learn how five off-the-shelf language models perform in assessing potential harm in texts, and discover the challenges in accurately moderating content authored by gender-queer individuals. Understand the implications of current content moderation practices that disproportionately flag posts from transgender and non-binary users as toxic. Explore potential solutions through chain-of-thought prompting to help language models better consider author identity context. Presented by Rebecca Dorn, a PhD candidate at USC's Information Science Institute, whose research focuses on AI fairness, natural language processing, and computational social science, particularly examining how NLP systems interact with marginalized communities' dialects.
Syllabus
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
Taught by
USC Information Sciences Institute