The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

BasisTech via YouTube Direct link

Additional Normalization Resources

18 of 20

18 of 20

Additional Normalization Resources

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

The Key to Cost-Efficient Quality Text Annotation - Data Pre-Processing

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Blueprint for Supervised Machine Learning
  3. 3 Goal: Maximize Manual Annotation Efficiency 1. Deduplicate • Minimize manual effort . Find unique subjects in our data sets so that humans only annotate each subject once and to prevent leaking dupli…
  4. 4 Part 1-Textual Deduplication: Measuring Similarity How can we find
  5. 5 Time Complexity of Pairwise Comparisons
  6. 6 Textual Deduplication: LSH Bitwise Rotations
  7. 7 Locality Sensitive Hashing: 32 Bit Simhash
  8. 8 Part 2 - Text Normalization Machine Representations
  9. 9 Text Normalization: Unicode Examples What's the difference?
  10. 10 Text Normalization: Halfwidth & Fullwidth Katakana
  11. 11 Text Normalization: Katakana Code Block
  12. 12 Text Normalization: Halfwidth & Fullwidth Forms
  13. 13 Text Normalization: Hebrew Presentation Forms
  14. 14 Text Normalization: Unicode Normalization Forms
  15. 15 Text Normalization: Composing Marks Normalization
  16. 16 Text Normalization: Katakana Normalization
  17. 17 Text Normalization: Hebrew Normalization
  18. 18 Additional Normalization Resources
  19. 19 Conclusion
  20. 20 Attributions: The Noun Project

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.