Overview
Dive into a comprehensive tutorial on text preprocessing for sentiment analysis using BERT, Hugging Face, PyTorch, and Python. Explore data preprocessing techniques, including tokenization with BertTokenizer, adding special tokens, padding sequences to fixed lengths, and creating attention masks. Learn to set up a notebook, explore data, choose optimal sequence lengths, create PyTorch datasets, split data into train/validation/test sets, and set up data loaders. Gain practical insights into natural language processing and machine learning workflows for sentiment analysis tasks.
Syllabus
Introduction
Notebook setup
Data exploration
Data preprocessing - tokenization, padding & attention mask
Choosing maximum sequence length
Create PyTorch dataset
Splitting the data into train, validation, and test sets
Creating data loaders
Taught by
Venelin Valkov