SUSE Labs Director Talks Live Kernel Patching with kGraft

142

SUSE Labs last month announced details of its kGraft research project to enable live patching of the Linux kernel. The solution has its benefits, including reduced need for downtime and easier downtime scheduling. It also has some drawbacks, outlined below by Vojtech Pavlik, Director of SUSE Labs.

Vojtech Pavlik, Director of SUSE LabsThe code, set to be released in March, doesn’t patch kernel code in-place but rather uses an ftrace-like approach to replace whole functions in the Linux kernel with fixed variants, said Pavlik. SUSE then plans to submit it to the Linux kernel community for upstream integration.

“SUSE wants to make sure the code we show is one that passes the quality required for Linux kernel patch submissions,” Pavlik said, “and that it works well and its merits can be judged and discussed.”

In this Q&A, Pavlik goes into more detail on SUSE’s live kernel patching project; how the kGraft patch integrates with the Linux kernel; how it compares with other live-patching solutions; how developers will be able to use the upcoming release; and the project’s interaction with the kernel community for upstream acceptance.

Pavlik will also lead a technical session on kGraft at the upcoming Linux Foundation Collaboration Summit, March 26-28 in Napa, Ca. (Request an invitation.)

Linux.com: First, what is live kernel patching and why is it necessary?

Vojtech Pavlik: Downtime is expensive. Even scheduled downtime is expensive. The common solution for that is redundant systems, but making them redundant enough to allow for scheduled downtime without losing the redundancy for unplanned failures is expensive. Live kernel patching reduces the need for scheduled downtime and allows for easier planning of scheduled downtime by allowing critical fixes to be applied ahead of the downtime window, hence reducing costs.

How does kGraft work? Where does it integrate with the kernel and how does it execute?

Pavlik: kGraft works by replacing whole functions in the Linux kernel with fixed variants; it is not about patching code in-place. kGraft itself is a modification to the Linux kernel that uses parts of several existing Linux technologies, combining them to achieve its purpose.

First, a patch module that contains all the new functions and some initialization code that registers with the kGraft code in kernel is loaded. Since it contains the new functions as regular code, the kernel module loader links them to any functions they may be calling inside the kernel.

Then, kGraft uses an ftrace-like approach to replace existing functions with their fixed instances by inserting a long jump at the beginning of each function that needs to be replaced. Ftrace uses a clever method based on inserting a breakpoint opcode (INT3) into the patched code first, only then replacing the rest of the bytes by the jump address and removing the breakpoint and replacing it with long jump opcode. Inter-processor non-maskable interrupts are used throughout the process to flush speculative decoding queues of other CPUs in the system. This allows switching to the new function without ever stopping the kernel, not even for a very short moment. The interruptions by IPI NMIs can be measured in microseconds.

However, these steps alone would not be good enough: since the functions would be replaced non-atomically, a new fixed function in one part of the kernel could still be calling an old function elsewhere or vice versa. If the semantics of the function interfaces changed in the patch, chaos would ensue.

Thus, until all functions are replaced, kGraft uses an approach based on trampolines and similar to RCU (read-copy-update), to ensure a consistent view of the world to each userspace thread, kernel thread or kernel interrupt. This way, an old function always would call another old function and a new function always a new one. Once the patching is done, trampolines are removed and the code can operate at full speed without performance impact other than an extra long jump for each patched function.

How is kGraft different from other live patching solutions?

Pavlik: Unlike other Linux kernel live patching technologies, the kGraft patch module is compiled from a regular source code file, which eases review by a human developer. That source code is automatically generated from the source patch, the running kernel’s source code and running kernel’s debuginfo. And since with kGraft the resulting new code is a part of a regular kernel module, kGraft doesn’t require a complex instruction decoder or custom linking code and can rely on the in-kernel linker to link the module with existing kernel functions.

kGraft doesn’t require stopping the whole system while it is doing the patching. Other technologies may need to call stop_machine(), potentially several times, to be able to patch code; or they may require a checkpoint-kexec-restore on the whole system, stopping it for many seconds. Stopping the kernel, however, may be a deal breaker for low-latency applications.

kGraft does have some limitations, including that a kernel needs to include kGraft prior to it being patched. In other words, kGraft cannot patch an unknown third-party kernel. kGraft also doesn’t specifically handle situations where one compiler is used to compile the old kernel and another compiler is used for compiling the patch. I believe these limitations aren’t restricting to enthusiast users or Linux distributors that would want to use it – all that is required is that the build environment is stable and predictable.

What will potential users be able to accomplish with the first release, slated for March?

Pavlik: Users will be able to take a source patch, almost entirely automatically convert it into a patch module source code, compile it into a patch kernel module and load it, applying the fix to the running kernel. At that point, kGraft will be able to patch regular kernel code, as well as code used in kernel thread and interrupt contexts. What is missing currently is the full automation of the patch-to-module conversion. The complexity of patches that kGraft will be able to handle at this point is limited. In the future, we will expand on both ease of use and the scope of patches that can be converted and applied fully automatically.

Why is it necessary to integrate real-time patching into the Linux kernel?

Pavlik: Live patching is a technology that inevitably is depending on low-level kernel internals to do its work. As such, integration into the upstream Linux kernel ensures its long-term viability and allows for collaboration of competing contributors on its enhancement. SUSE strives to get all its enhancements to open source software integrated into upstream projects as contributions to open source community development. kGraft is not an exception.

Have you already been working with the kernel community on this? What has been their response?

Pavlik: “Release early, release often” is the open source motto. And while SUSE stands behind this idea, kGraft until recently has purely been a research project. Another motto frequently used on the Linux Kernel Mailing List is “Show me the code.” SUSE wants to make sure the code we show is one that passes the quality required for Linux kernel patch submissions, and that it works well and its merits can be judged and discussed. Hence the scheduled March release; that will be the starting point.

Will the live patch work equally well in a cloud environment as on bare metal?

Pavlik: Yes, virtualization that’s used in cloud environments is no impediment to using kGraft. The value of reduced downtime of individual services isn’t diminished by the fact that they are running inside a cloud. While kGraft will initially run on the x86-64 architecture only, its simple design makes extending it to work on other architectures – including ARM, IBM POWER or IBM System z – easy. That would allow it to work on a vast array of hardware, from cell phones to mainframes.

Vojtech Pavlik is a Director of SUSE Labs, a global team within R&D at SUSE focusing on furthering core Linux technologies. In his kernel developer past, he has worked on USB support in Linux and created the Linux Input subsystem. He enjoys solving interesting problems that Linux faces, recently proposing the MOK concept as a solution for UEFI Secure Boot on Linux.