The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

BasisTech via YouTube Direct link

Blueprint for Supervised Machine Learning

2 of 20

2 of 20

Blueprint for Supervised Machine Learning

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Blueprint for Supervised Machine Learning
  3. 3 Goal: Maximize Manual Annotation Efficiency 1. Deduplicate • Minimize manual effort . Find unique subjects in our data sets so that humans only annotate each subject once and to prevent leaking dupli…
  4. 4 Part 1-Textual Deduplication: Measuring Similarity How can we find
  5. 5 Time Complexity of Pairwise Comparisons
  6. 6 Textual Deduplication: LSH Bitwise Rotations
  7. 7 Locality Sensitive Hashing: 32 Bit Simhash
  8. 8 Part 2 - Text Normalization Machine Representations
  9. 9 Text Normalization: Unicode Examples What's the difference?
  10. 10 Text Normalization: Halfwidth & Fullwidth Katakana
  11. 11 Text Normalization: Katakana Code Block
  12. 12 Text Normalization: Halfwidth & Fullwidth Forms
  13. 13 Text Normalization: Hebrew Presentation Forms
  14. 14 Text Normalization: Unicode Normalization Forms
  15. 15 Text Normalization: Composing Marks Normalization
  16. 16 Text Normalization: Katakana Normalization
  17. 17 Text Normalization: Hebrew Normalization
  18. 18 Additional Normalization Resources
  19. 19 Conclusion
  20. 20 Attributions: The Noun Project

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.