Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of clustering in BERTopic topic modeling through this 27-minute conference talk from Conf42 ML 2023. Delve into the world of topic modeling use cases, understand why BERTopic is a preferred choice, and examine its end-to-end flow. Learn about HDBSCAN clustering algorithm, its foundations in DBSCAN, and how it utilizes k-NN and minimum spanning trees to define density-based spatial clustering. Discover the concept of stability score "λ" and its role in determining final clusters. Analyze HDBSCAN's performance, strengths, and weaknesses through a practical demo and comprehensive explanation. Gain insights into future scope and access valuable references for further exploration of this powerful topic modeling technique.
Syllabus
intro
preface
who are we?
agenda
topic modeling use case
why bertopic?
bertopic end-to-end flow
clustering
dataset description
demo
what is hdbscan?
to understand hdbscan we need to know dbscan
what if there was no fixed radius?
k-nn algorithm to define radius
minimum spanning tree finds density and hierachy
density based spatial clustering
stability score "λ"
final clusters
hdbscan steps
hdbscan - performance comparison
hdbscan - strenghts and weaknesses
conclusion and future scope
references & ressources
thank you
Taught by
Conf42