Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a case study detailing Expedia's investigation into a disruptive incident caused by lock contention in Istio. Dive deep into the systematic approach used to identify the root cause, overcoming challenges posed by limited observability. Learn about the impact on performance and business metrics, and the collaborative efforts with Istio open source maintainers to address the issue. Discover how Expedia's Compute platform team built a custom Istio Pilot with enhanced observability and modified controller queues to pinpoint the problem. Gain insights into the profiling process that uncovered the lock contention, leading to significant delays in Proxy propagation and resulting outages. Examine the interim workaround implemented by Expedia and its effects before an upstream solution became available.