Completed
Goal: Maximize Manual Annotation Efficiency 1. Deduplicate • Minimize manual effort . Find unique subjects in our data sets so that humans only annotate each subject once and to prevent leaking dupli…
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Blueprint for Supervised Machine Learning
- 3 Goal: Maximize Manual Annotation Efficiency 1. Deduplicate • Minimize manual effort . Find unique subjects in our data sets so that humans only annotate each subject once and to prevent leaking dupli…
- 4 Part 1-Textual Deduplication: Measuring Similarity How can we find
- 5 Time Complexity of Pairwise Comparisons
- 6 Textual Deduplication: LSH Bitwise Rotations
- 7 Locality Sensitive Hashing: 32 Bit Simhash
- 8 Part 2 - Text Normalization Machine Representations
- 9 Text Normalization: Unicode Examples What's the difference?
- 10 Text Normalization: Halfwidth & Fullwidth Katakana
- 11 Text Normalization: Katakana Code Block
- 12 Text Normalization: Halfwidth & Fullwidth Forms
- 13 Text Normalization: Hebrew Presentation Forms
- 14 Text Normalization: Unicode Normalization Forms
- 15 Text Normalization: Composing Marks Normalization
- 16 Text Normalization: Katakana Normalization
- 17 Text Normalization: Hebrew Normalization
- 18 Additional Normalization Resources
- 19 Conclusion
- 20 Attributions: The Noun Project