Learn how to tackle complex Site Reliability Engineering challenges through a conference talk that walks through a real-world case study where Eclipse Trace Compass was used to resolve a system-wide slowdown affecting thousands of developers. Discover the systematic approach to troubleshooting that includes handling multi-gigabyte logs, creating and sharing parsers, performing data analysis, and implementing custom analysis tools. Master practical techniques for log/tracing strategies and pre-emptive log post-mortems that can save organizations significant time and resources. While specific identifiers have been anonymized, follow along with the actual problem-solving process from initial log collection through creating custom analysis parsers and implementing targeted solutions in Trace Compass. Gain valuable insights applicable to system administrators, developers, DevOps engineers, managers, and anyone interested in understanding and optimizing system performance at scale.
Overview
Syllabus
Solving an Internal Real-World SRE Issue with Eclipse Trace Compass
Taught by
Eclipse Foundation