Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Pre-Training BERT from Scratch for Domain-Specific Knowledge Using PyTorch - Part 51

Discover AI via YouTube

Overview

Learn to pre-train a BERT (Bidirectional Encoder Representations from Transformers) model from scratch in this comprehensive Python tutorial using PyTorch for domain-specific data applications. Master the process of training an optimized Tokenizer, designing custom BERT architectures, and implementing pre-training with a masked Language Model Head (MLM). Explore techniques for defining custom vocabulary sizes ranging from 8K to 60K tokens, configuring BERT architecture depths up to 96 layers, and optimizing GPU training for domain-specific knowledge encoding. Gain hands-on experience with transformer-based machine learning for natural language processing, and discover how to leverage the pre-trained model to build a SBERT (Sentence Transformers) model for Neural Information Retrieval systems. Follow along with provided code examples in Google Colab to implement tokenization, model configuration, pretraining tasks, and evaluate training results through practical demonstrations.

Syllabus

Introduction
Downloading data sets
Tokenization
Tokenizer
Fast implementation
Fast tokenizer
Encoding
Training Data Set
Bird Model
Bird Model Configuration
Bird Model Pretraining
Masking Task
Training Arguments
Training Example
Training Results
Training Loss
Expert Model

Taught by

Discover AI

Reviews

Start your review of Pre-Training BERT from Scratch for Domain-Specific Knowledge Using PyTorch - Part 51

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.