Explore the intricacies of mojibake, a common text encoding issue, in this 12-minute conference talk by Robyn Speer at !!Con 2021. Discover why real-world text sometimes appears as garbled characters like "Merci de télécharger le plug-in" and learn about the Python module "ftfy" designed to solve these Unicode text puzzles. Delve into the causes of mojibake, strategies for fixing and preventing it, and understand why machine learning isn't the solution for this particular problem. Gain insights from Speer, the developer of the multilingual knowledge graph ConceptNet and former co-founder of an NLP startup, as she shares her expertise in handling text encoding challenges in natural-language systems.
Overview
Syllabus
!!Con 2021 - Mojibake! What the h—ck happened to these strings? by Robyn Speer
Taught by
Confreaks