Completed
Early Observations • No out of the box metrics on socket operations
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Latency Distributions and Micro-Benchmarking to Identify and Characterize Kernel Hotspots
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Why Large Bare Metal Boxes? • Faster local communication UNIX Domain Sockets Shared Memory
- 3 The Scale in our Department • 100K processes across hundreds of physical machines
- 4 SysV semaphore bottleneck (AIX)
- 5 Observations and Findings AIX CPU measurement when hyper-threading is very misleading No 'out of the box metrics on SysV IPC operations Sporadic slowness (depending on concurrency/contention)
- 6 SysV shared memory bottleneck (Linux) • Low-level application infrastructure code dropping messages Messaging leverages a form of "zero copy" IPC using Sysv
- 7 SysV shared memory bottleneck (Linux RHEL 6) The micro-benchmark
- 8 Case #2: Observations and Findings • No 'out of the box metrics on SysV IPC operations
- 9 UNIX domain socket bottleneck (Solaris) • Critical software infrastructure experiencing timeouts on load Identity management with very strict SLOS Narrowing down the problem A key SLI for the service…
- 10 An Aside: Histograms and Distributions are Useful! • More representative of the data set
- 11 An Aside: A Histogram Example
- 12 Early Observations • No out of the box metrics on socket operations
- 13 Case #3: UNIX domain socket bottleneck (Solaris) The micro-benchmarkt-testing against size
- 14 Case #3: Conclusions • Solaris 11.3 is limited to a max of 256K UDS sockets
- 15 Task clone and exit bottleneck (Linux)
- 16 More Summary (Plea to Kernel Folks) • The Prime Directive of Monitoring: Non-interference
- 17 References