Get up and running with natural language processing (NLP) using R, the popular programming language for statistical computing and graphics.
Overview
Syllabus
Introduction
- Welcome to natural language processing with R
- Skills and tools you’ll need to be successful in this course
- What is tm and why do you need it?
- tm documentation walk-through
- Real-world NLP with tm
- Real-world NLP with quanteda
- Real-world NLP with tidytext
- Understanding corpora and sources
- Examining corpora
- Examining sources
- Custom sources
- Combining and subsetting corpora
- Working with document metadata
- Make useful metadata
- Finding and filtering based on metadata
- Transformations
- Stop words
- Stemming
- Lemmatization
- Tokenization
- Ngrams
- Part of speech tagging
- Understanding the document-term matrix
- Create the document-term matrix
- Weighting the document-term matrix
- Focus the document-term matrix
- Word and document frequency
- Hierarchical clustering
- Associated terms
- What is sentiment analysis?
- Real-world example of sentiment analysis
- Sentiment datasets
- Sentiment tools
- Plotting text mining
- Plotting Zipf’s and Heap’s Law
- Word clouds
- Your next steps in NLP
Taught by
Mark Niemann-Ross