Completed
Textual Deduplication: LSH Bitwise Rotations
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Blueprint for Supervised Machine Learning
- 3 Goal: Maximize Manual Annotation Efficiency 1. Deduplicate • Minimize manual effort . Find unique subjects in our data sets so that humans only annotate each subject once and to prevent leaking dupli…
- 4 Part 1-Textual Deduplication: Measuring Similarity How can we find
- 5 Time Complexity of Pairwise Comparisons
- 6 Textual Deduplication: LSH Bitwise Rotations
- 7 Locality Sensitive Hashing: 32 Bit Simhash
- 8 Part 2 - Text Normalization Machine Representations
- 9 Text Normalization: Unicode Examples What's the difference?
- 10 Text Normalization: Halfwidth & Fullwidth Katakana
- 11 Text Normalization: Katakana Code Block
- 12 Text Normalization: Halfwidth & Fullwidth Forms
- 13 Text Normalization: Hebrew Presentation Forms
- 14 Text Normalization: Unicode Normalization Forms
- 15 Text Normalization: Composing Marks Normalization
- 16 Text Normalization: Katakana Normalization
- 17 Text Normalization: Hebrew Normalization
- 18 Additional Normalization Resources
- 19 Conclusion
- 20 Attributions: The Noun Project