Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Train a Domain-Specific BERT Tokenizer Using SBERT - Part 50

Discover AI via YouTube

Overview

Learn to build and optimize a SBERT Sentence Transformer by training a specialized BERT tokenizer for complex domain-specific knowledge in this 15-minute tutorial. Master the implementation of Byte-pair Encoding (BPE) tokenization for processing specialized content like bio-pharmacological textbooks and scientific literature. Explore 3D visualizations of SBERT models to gain deeper insights into the step-by-step process of developing custom BERT models tailored to specific domains such as bio-medical research with macro-molecular interdisciplinary content. Discover how to effectively encode domain-specific knowledge from sources like arXiv pre-prints or digitized textbooks, and transform these sentences into optimal input for specialized BERT models. Gain hands-on experience in pre-training BERT models from scratch while leveraging the advantages of customized, contemporary pre-trained BERT tokenizers.

Syllabus

Introduction
Pretokenization
Trainer
Tokenizer
Demo

Taught by

Discover AI

Reviews

Start your review of Train a Domain-Specific BERT Tokenizer Using SBERT - Part 50

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.