As has recently been announced on the main kernel.org page, the main kernel.org server (known as “hera”) was recently compromised by an unknown intruder. This person was able to gain “root” access, meaning they had the full run of the system. Speaking as just one of many members of the kernel development community, I can say that this episode is disturbing and embarrassing. But I can also say that there is no need to worry about the integrity of the kernel source or of any other software hosted on the kernel.org systems.
Kernel.org is, of course, the home for the Linux kernel. Many other projects live there as well. On the face of it, that would make kernel.org a tempting target for an attack. What self-respecting cracker wouldn’t want an opportunity to place some special code into the Linux kernel? Such code would, over time, find its way into millions of machines worldwide. The injection of backdoors or other malware is a concern for any software maintainer – open source or otherwise – but it turns out that we are well protected against that sort of attack.
If kernel developers worked by shipping simple files of source code around, they might well be vulnerable to malware added by an intruder. But that is not how kernel development is done. The code for the kernel (and for many other projects) is managed with the “git” source code management system. And git does not allow the code to be modified by third parties without people knowing about it. It’s worth taking a moment to look at how that works.
A cryptographic “hashing function” is a mathematical formula which boils the contents of a file down to a small number. “Small” is relative; git’s hash function produces 160-bit numbers, which are quite big by normal standards – it is roughly equal to the number of atoms in the Earth. The key to the hash function is that, if the contents of the file change, the hash will change too. Creating any new file matching the hash of an existing file is not really possible; if you want that new file to look like the old one with the exception of a bit of hostile code, the challenge is even bigger. So an attacker would be unable to change a file without changing its hash as well. Git checks hashes regularly, so a simplistic attempt to corrupt a file would be flagged almost immediately.
The hashing does not stop there. For any given state of the kernel source tree, git calculates a hash based on (1) the hashes of all the files contained within that tree, and (2) the hashes of all of the previous states of the tree. So, for example, the hash for the kernel at the 3.0 release is 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe. There is no way to change any of the files within that release – or within any previous release – without changing that hash. If anybody (even the kernel.org repository) were to present a 3.0 kernel with a different hash, it would be immediately apparent that something was not right.
You might be thinking that 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe is an awfully long number to memorize and check. If we were dependent on humans to check the hash values, we would have reason to worry. But computers are very good at checking those values. And there are a lot of computers available to do that checking.
The machine I am typing this article on has a full copy of the kernel git repository. Actually, it has more than one. All kernel developers – and many people who are not kernel developers – have at least one copy of the repository somewhere. If an attacker were to corrupt the kernel.org repository, those other developers would notice the next time they updated their personal repositories – something that happens many times every day. If the attacker were to simply add new patches that had not gone through Linus Torvalds’s personal copy of the repository (which is not the copy on kernel.org), he would notice the next time he tried to make a change of his own. Git will see that the hash values are not what they should be and raise the alarm.
Kernel.org may seem like the place where kernel development is done, but it’s not; it’s really just a distribution point. The integrity of that distribution point is protected by the combination of clever software and thousands of copies of the repository distributed around the world. So when we say that we know the kernel source has not been compromised on kernel.org, we really know it.
The kernel.org administrators have shown themselves to be careful and capable people over many years. It seems like they’ve had some sleepless nights, with the prospect of quite a few more to come. It will be necessary to rebuild the kernel.org infrastructure and to figure out how the attacker got in. The integrity of those systems was lost; restoring it and protecting it into the future will take a considerable amount of work. But people running Linux need not worry about the integrity of their kernels; that is protected by defenses stronger than those of any single computer.