Overview
Learn advanced data mining concepts in this 21-minute lecture focusing on Min Hashing techniques and statistical foundations. Explore the mathematical principles behind choosing optimal k values for Min Hashing, including Probably Approximately Correct (PAC) learning, Central Limit Theorem, and Chernoff-Hoeffding Inequality. Master the application of these theoretical concepts to obtain accurate Jaccard Similarity estimates through Min Hashing. Delve into practical implementations while understanding the statistical guarantees that make Min Hashing a powerful technique in data mining applications.
Syllabus
Recording Start
Lecture starts
Course Materials Copyright
Announcements
Choosing k for minhashing motivation
PAC
Central Limit Theorem
Chernoff-Hoeffding Inequality
Choosing k for a good estimate of JS
Recording ends
Taught by
UofU Data Science