How Multilingual Is Your NLP Model?

Overview

Explore the challenges and solutions for multilingual Natural Language Processing (NLP) models in this 45-minute PyCon US talk by Shreya Khurana. Dive into the complexities of language identification, transliterated and code-switched text, and the use of multilingual BERT models. Learn about existing Python frameworks for language identification tasks and their limitations. Discover approaches to handling the lack of annotated datasets for transliterated and code-switched text using web crawlers and self-generated datasets. Examine the performance of Google's multilingual BERT model trained in 104 languages through practical examples. Gain insights into evaluating NLP models for various tasks in a multilingual context. Access additional resources and code examples on GitHub to further enhance your understanding of multilingual NLP techniques.

Syllabus

Introduction
About me
Outline
Why multilingual data
Tasks associated with language systems
Syntax mixing
Transliterated text
Language identification
Language identification in practice
Other examples
Lambda ID
Blanked
Python
Limitations
Data augmentation
Simple example
The Transformer
Multiheaded attention
Stateoftheart soda
Why is it special
Word Piece Processing
Statistics of Languages
Bird Masked Language Model
Prediction Function
Code Switched Example
Lyrics Example
Task Evaluation
Generation Evaluation
Summary

Taught by

PyCon US

Reviews

Start your review of How Multilingual Is Your NLP Model?

Taught by

Learn BERT - most powerful NLP algorithm by Google

CMU Multilingual NLP 2020 - Text Classification and Sequence Labeling

CMU Multilingual NLP 2020 - Advanced Text Classification-Labeling

Building a Pipeline for State-of-the-Art NLP Using Hugging Face Tools

Never Stop Learning.