Discover the key strategies and lessons learned in building GitHub's high-performance code search engine, designed to handle the platform's massive scale. Explore how the team leveraged the unique content-addressable nature of Git repositories to create the world's largest publicly available code search engine, indexing over 60 million repositories and 160 TB of content. Learn about innovative techniques such as deduplication, repository similarity analysis, full index compaction, multiple levels of sharding, and load balancing. Gain insights into transforming code search from a frustrating experience into a powerful feature for developers. This 36-minute conference talk, presented by Luke Francl at Strange Loop 2023, offers valuable knowledge for those interested in large-scale search engine development and optimization.
Overview
Syllabus
"Lessons from building GitHub code search" by Luke Francl (Strange Loop 2023)
Taught by
Strange Loop Conference