A Summer Spent on the Linux Kernel Virtual File System

101

Calvin Owens has learned a lot about bug hunting and fixing just by following the discussion among developers on the Linux kernel mailing list. He’s even contributed a few small driver fixes over the past year. But his first real deep dive into kernel development came during his Google Summer of Code internship with The Linux Foundation this year.

Calvin Owens“I’ve always thought the kernel is deeply fascinating, and the opportunity to have the time to work on it in a meaningful way was a
very exciting prospect,” said Owens, a computer science and music major, studying clarinet, at Southern Methodist University in Dallas.

Owens was one of 15 GSoC interns with The Linux Foundation, where he worked with Yongqiang Yang on efficient sparse file handling in the page cache of the Linux kernel Virtual File System (VFS). He developed a “sparse page deduplication” method to avoid backing sparse regions of files with physical pages full of zeros.

A method for sparse files

Reading from sparse files, which contain(potentially very large) regions that aren’t extant on the disk, returns all zeros, Owens said. “This is advantageous, especially for virtual hard drives for VM’s, since the unused portions of the file don’t have to waste space on the disk,” he said.

“However, the VFS layer of the kernel isn’t aware of what pages in a file are and are not sparse. So when you read a page corresponding to a sparse region in a file, the FS knows to return zeros, but the page cache dutifully allocates a physical page to back it,” Owens said.

Owens’ program adds logic to VFS to make note of the sparse regions but prevents it from allocating a page to back the region, unless it’s later written to. The update should improve any workload that makes heavy use of sparse files, he said.

“The page cache, being in RAM, is orders of magnitude faster than the hard disk itself. Keeping pages of zeros in it prevents pages of real data from being ready to go when they’re needed,” Owens said.

Headed upstream

While the method works for the major in-tree filesystems, it needs more work before it can be merged upstream. 

“Originally, I accomplished this by putting references to the ZERO_PAGE in the page cache radix tree for the file,” he said. “I’ve spoken to a couple kernel developers who aren’t wild about that solution, so I’m currently working on implementing it more cleanly.”

In the meantime, Owens says he finds kernel development to be rewarding work and hopes to find a job as a kernel developer after graduation.

“I learned a great deal about the inner workings of the kernel, VFS and memory management in particular,” Owens said. “Digesting huge swaths of dense kernel code was a bit overwhelming at first. Learning to deal with that was a very valuable experience.”

Editor’s note: See our previous profiles of GSoC intern Eduard Bachmakov who contributed to the LLVM Clang Static Analyzer for the Linux kernel and Anton Kirilenko, who worked with Linux Foundation Fellow Till Kamppeter to improve the PHP/ MySQL application that manages submissions to the growing printer and printer driver database on the OpenPrinting website.

And if you’re interested in learning more about Google Summer of Code internships in 2014 please visit: http://www.google-melange.com/gsoc/homepage/google/gsoc2014

The next round of applications starts Feb. 3, 2014.