Programming Snapshot: Implementing Fast Queries for Local Files in Go

269

To find files quickly in the deeply nested subdirectories of his home directory, Mike whips up a Go program to index file metadata in an SQLite database.

…the GitHub Codesearch [1] project, with its indexer built in Go, at least lets you browse locally available repositories, index them, and then search for code snippets in a flash. Its author, Russ Cox, then an intern at Google, explained later how the search works [2].

How about using a similar method to create an index of files below a start directory to perform quick queries such as: “Which files have recently been modified?” “Which are the biggest wasters of space?” Or “Which file names match the following pattern?”

Unix filesystems store metadata in inodes, which reside in flattened structures on disk that cause database-style queries to run at a snail’s pace. To take a look at a file’s metadata, run the statcommand on it and take a look at the file size and timestamps, such as the time of the last modification (Figure 2).

Figure 2: Inode metadata of a file, here determined by stat, can be used to build an index.

Newer filesystems like ZFS or Btrfs take a more database-like approach in the way they organize the files they contain but do not go far enough to be able to support meaningful queries from userspace.

Fast Forward Instead of Pause

For example, if you want to find all files over 100MB on the disk, you can do this with a find call like:

find / -type f -size +100M

If you are running the search on a traditional hard disk, take a coffee break. Even on a fast SSD, you need to prepare yourself for long search times in the minute range. The reason for this is that the data is scattered in a query-unfriendly way across the sectors of the disk.

Read more at Linux Pro Magazine