Overview
Explore data pipelines in this third installment of the introduction to text analytics with R video. Dive into textual data exploration for pre-processing challenges, utilize the quanteda package for text analytics, and create a prototypical text analytics pre-processing pipeline. Learn about tokenization, lower casing, stop word removal, and stemming. Develop skills to create a document-frequency matrix used for training machine learning models. Access the Kaggle dataset and R code used in the series to practice hands-on. Gain valuable insights into text analytics techniques and their application in data science projects.
Syllabus
Intro
HTML Escapes
Quantium
Tokenization
Tokens
Stop Words
Quantity
Stem
DFM
Taught by
Data Science Dojo