Explore the challenges and solutions of implementing Kernel Live Patching (KLP) at hyperscale in this conference talk by David Carl Vernet and Song Liu from Meta. Dive into the intricacies of applying quick fixes to live Linux kernels without shutting down workloads or rebooting servers. Learn about the unique issues faced when deploying KLP across a fleet of several million servers, including performance regressions, conflicts with tracing and monitoring tools, and sporadic failures during patch application. Gain insights into Meta's approach to building, deploying, and monitoring KLPs at scale, and discover recent improvements to KLP infrastructure. Examine outstanding issues with KLP at hyperscale and potential solutions. This updated version of the talk, originally presented at Linux Plumbers Conference 2022, includes additional details and visuals to make the content more accessible to mid-level engineers.
Overview
Syllabus
Introduction
Full reboot
Pros and cons
KExec
Pros Cons
klp
Ecosystem
Challenges
Bugs
Problem
Tracing
Taught by
Linux Foundation