Open Source Libferris: Chasing the “Everything is a File System” Dream

556

The open source libferris project is a virtual file system that aims to provide a single file system interface for all data. I have been advancing libferris towards that goal over the last ten years. Over that time, libferris has gained support for mounting relational databases; physical devices like printers, webcams, and scanners; composite files like Berkeley DB and XML files; applications like Amarok, Firefox, emacs, pulseaudio, XWindow, dbus, and evolution; and more recently web services like GDrive, YouTube, Vimeo, and Flickr, as well as many other things.

Libferris gives you the same filesystem access to all of the above data sources. Why shouldn’t you be able to use your text editor of choice to edit a comment on Flickr or your preferred image viewer to look at a virtual jpeg created from a webcam. Providing a filesystem API lets all applications access (and potentially update) anything libferris can get at. In order to provide access to as many things as possible libferris runs in the user address space. Things like accessing relational databases shouldn’t have to happen from inside the Linux kernel itself. Once a file system is mounted you can dig into the metadata, making things easier to find, filter and index.

I have some examples of command line use of some of these features later in the article. But first I’ll give you a little more background on some of the key features.

Access Metadata

Early on in the project’s life, it became clear that using just directories and files made it unnatural and difficult to access metadata, for example, little pieces of information like the width and height of an image or the artist of an audio file. Modern filesystems now offer an Extended Attribute (EA) interface which allows key-value pairs to be associated with files. Building on that idea, the EA interface was also virtualized in libferris to allow the key-value pairs to come from and go to many places.

Having “many places” for metadata to come from and go to might seem confusing at first, so hopefully a few examples will clear things up a little. Reading the “width” EA on an image file will cause libferris to work out the width in pixels of the image and return that to you as the value of the EA. If you have kernel EA available in your home directory, writing to the “foo” EA on an image file in libferris will cause that EA to be stored by the Linux kernel filesystem in an on disk EA. If you don’t have kernel EA support, or the file permissions do not allow you to write an EA on a file, then libferris will store that EA for you in an RDF repository. When you mount a relational database or XML file then the EA will come from the database tuples or XML attributes. So most of the time you don’t have to care about the details, the metadata storage will try to “do what you mean” and has the RDF fallback.

The fallback to using RDF allows you to store metadata on any virtual filesystem regardless of what the underlying storage permits you to do. For example, you can annotate a website and libferris will store that for you in RDF and make it available through the same EA interface that all metadata is available through. For the semantic web fans out there, you can also smush your RDF metadata and link multiple URLs which should logically share metadata.

Sort and Filter Metadata

A single file system API allows command line tools like ls, cp, cat, io redirection and others to dig into any data source that libferris can access. In a similar way, having all metadata offered through the EA key-value interface allows the sorting and filtering support in libferris to be applied to any metadata. If you want to see the images in a directory sorted by mime type, aperture size, and then by modification time you can do it with ferrisls. Each EA can have type information associated with it, and most do. So libferris knows that the “size” and “width” EA are numbers and can sort the values as you would expect.

Index and Search

Once the file system and metadata interfaces were available it made sense to add index and search support. There are many implementations of indexing to choose from, and no single implementation of index and search provides the best results in all situations. You might want to use a small memory mapped file on a low power NAS to provide simple locate(1) functionality, while on a larger file server you might want to use PostgreSQL for your indexing needs. The results from a query can also be returned as a libferris filesystem. So you can directly “cat” or ferriscp the results of a search.

You can also federate multiple libferris indexes together to combine results from multiple locations, such as a local index on your home directory on the desktop machine and a file server index.

Getting hands on…

OK, so enough with the history and overview, and on with how to install libferris (farther below) and some examples of things that you might find useful to do with it.

Access and modify files

Consider the below XML file as an example input. Using ferrisls you can reach right into an XML file and list the elements at any level. This is a common interaction style in libferris for files which might be thought of as composite such as XML, db4, archives, csv and so on. You don’t have to worry about mounting, just read the file as though it was a directory.

If you know a little bit more about the structure of the XML file, you can select which XML attributes you want using the libferris filesystem Extended Attribute interface. The –show-ea option to ferrisls tells it what EA you would like to see in the listing. In this case, the filesystem EA are created from and send data to XML attributes.

$ cat basic.xml
<top>
  <person name="alice" age="25" />
  <person name="bob"   age="32" />
  <person name="cathy" age="35" >boo!</person>
</top>
$ ferrisls basic.xml/top
                                 0                    alice 
                                 0                    bob 
                                 0                    cathy 
$ ferrisls --show-ea=name,age basic.xml/top
alice   25
bob     32
cathy   35
$ fcat basic.xml/top/cathy
boo!

Libferris can also perform modifying actions on many of its virtual filesystems, and certainly for mounted XML files. The below example uses the ferris-redirect command to stream data from its standard input into any libferris file. Libferris also has FerrisFUSE which provides an implementation of Filesystem in UserSpace. So you can access libferris with any tool that can use a Linux in kernel filesystem if you like. For example, you could also mount the basic.xml file using FerrisFUSE and just use normal bash redirection to update it.

I have reimplemented many of the basic file system tools for two reasons: so that FUSE is not always needed (no explicit mounting and unmounting) and to be able to provide extended functionality such as sorting and filtering output on any metadata that the filesystem offers. You can consider the sequence | ferris-redirect filename to be much like > filename in bash redirection but with ferris-redirect you can write that data to more places. As you can see in the below, I’ve updated the text content of the bob XML element using io redirection.

$ echo "a really nice guy" | ferris-redirect basic.xml/top/bob
$ cat basic.xml
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<top>
  <person age="25" name="alice"/>
  <person age="32" name="bob">a really nice guy
</person>
  <person age="35" name="cathy">boo!</person>
</top>

If you prefer binary files to XML, you can mount Berkeley db4 in the same way as XML with libferris. Below I use the “fcreate” tool to make a new empty Berkeley db4 file. The “rdn” means Relative Distinguished Name and is a bit of a holdover from early use of some LDAP naming. I will likely convert “rdn” instances to “filename” throughout the code in the future. Because a mounted db4 file and a mounted XML file are exposed in a very similar way, you can run XQuery on db4 instead of XML files if you want to speed up some of the lookups in your query.

$ fcreate --create-type=db4 --rdn=foo.db .
Created new context: file:///tmp/foo.db
$ ls -lh foo.db
-rw-------. 1 ben ben 8.0K Dec 30 19:52 foo.db
$ date | ferris-redirect foo.db/file1
$ fcat foo.db/file1
Mon Dec 30 19:52:41 EST 2013
$ db_dump -p foo.db
VERSION=3
format=print
type=btree
db_pagesize=4096
HEADER=END
 file1
 Mon Dec 30 19:52:41 EST 2013a
DATA=END

Mount relational databases

Libferris can also mount relational databases as filesystems using QtSql and it also has explicit support for mounting PostgreSQL. The extended attribute interface is extremely useful here as the tuples in a table map nicely to files in the filesystem and the columns in a tuple map to the EA for each file. Below I create a simple SQLite database with only one table. Using ferrisls -lh shows you that the file for each tuple is named using the “name” column from the table. You can list any table using the path format databasename/tablename as shown. For postgresql databases you can also access PostgreSQL functions through the filesystem.

The ferrisls -0 option complements the normal -l long listing option to ls. The main difference is that with -0 you ask the filesystem itself what it thinks are the most interesting EA for you to see. Sometimes there are more interesting things than the size, protection bits, and mtime to see. In this case, every column from the database table is considered “interesting” along with an EA telling you what the primary key for the tuple is. This is the repeated “id” at the end of the ferrisls -0 output. The –xml and –json options build on the -0 option but produce output in XML or JSON format respectively.

$ sqlite3 test.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table trees ( name varchar, count int, id integer primary key autoincrement ); 
sqlite>  insert into trees (name,count) values ( 'General Sherman', 5 );
sqlite>  insert into trees (name,count) values ( 'Gum', 44 );
sqlite>  insert into trees (name,count) values ( 'Mahogany', 9 );
sqlite> select * from trees;
General Sherman|5|1
Gum|44|2
Mahogany|9|3
$ ferrisls -lh test.db/trees
                                 56                   General Sherman 
                                 45                   Gum 
                                 49                   Mahogany 
$ ferrisls -0 test.db/trees
General Sherman 5       1       General Sherman id
Gum     44      2       Gum     id
Mahogany        9       3       Mahogany        id
$ ferrisls --xml test.db/trees
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ferrisls>
  <ferrisls count="" id="" name="trees" primary-key="id" url="file:///tmp/test.db/trees">
    <context count="5" id="1" name="General Sherman" primary-key="id"/>
    <context count="44" id="2" name="Gum" primary-key="id"/>
    <context count="9" id="3" name="Mahogany" primary-key="id"/>
  </ferrisls>
</ferrisls>
$ ferrisls --json test.db/trees
{
 "children" : { "1" : { "name" : "General Sherman" }, 
                 "2" : { "name" : "Gum" }, 
                 "3" : { "name" : "Mahogany" } }, 
 "self" : { "name" : "trees" } 
}

Update databases

Databases can also be updated through the filesystem by writing to the EA as shown below. The -a option to ferris-redirect causes the data to be written to the given (extended) attribute instead of into the main content of the file. The screenshot following is the “ego” file manager which allows you to directly click to edit EA. Much of the functionality of ego is accessed through the many side panels, drag and drop, and context menus. If mounting SQLite seems interesting, see my longer blog post for more details including creating virtual tables in SQLite to access libferris and exposing an apache access_log file through libferris as a virtual table.

$ echo -n Gum Tree | ferris-redirect -a name test.db/trees/2
$ ferrisls -0 test.db/trees
General Sherman 5       1       General Sherman id
Gum Tree        44      2       Gum Tree        id
Mahogany        9       3       Mahogany        id

ego sqlite window

Mount Google spreadsheets

Looking back at how SQLite database tables are exposed through the filesystem, it isn’t much of a stretch to see that a spreadsheet can be exposed as one file per row and EA to get access to each column. Libferris has support for many of Google’s services such as Spreadsheets, YouTube, and Drive. The spreadsheet access is at the cell level so you can create and update spreadsheets from the command line as I have done in the below example. There is more information on mounting Google spreadsheets with libferris.

$ ferrisls -0 google://
docs
drive
spreadsheets
youtube
$ echo 20 | ferris-redirect -a c  google://spreadsheets/smalltest1/Sheet5/2
$ echo  7 | ferris-redirect -a c  google://spreadsheets/smalltest1/Sheet5/3
$ echo 14 | ferris-redirect -a c  google://spreadsheets/smalltest1/Sheet5/4
$ echo -n "=sum(C2:C4)" | ferris-redirect --ea c google://spreadsheets/smalltest1/Sheet5/1
1                       41
2                       20
3                       7 
4                       14 

Mount Google Drive

Support for mounting Google Drive was added earlier in 2013. A major feature of online storage sites is the ability to selectively share your files with other people. To do this with a mounted Google Drive, just echo the email address of the person you wish to share an uploaded file with into the “shares” EA with libferris.

$ ferriscp goodstuff.txt google://drive/
$ echo "
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 " 
    | ferris-redirect -a shares "google://drive/goodstuff.txt"
$ echo "
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 " 
    | ferris-redirect -a shares "google://drive/goodstuff.txt"

Mount applications

Moving along to mounting some applications, the examples below show mounting Amarok, the clipboard, GStreamer, sane, and Flickr. Amarok has three main directories in the filesystem; control, current, and playlist. Ferrisls on amarok://control will also show you the current state of Amarok. In the control directory, the play/pause toggle file can control if Amarok is playing, you can set the volume by writing a value (percent of full volume) into the volume file. The Amarok playlist is exposed as a virtual directory and you can directly “copy” the tracks from that virtual directory to another filesystem if you like.

The clipboard filesystem is backed by Klippper so you might not see it unless you are using KDE. The GStreamer filesystem allows you to access media sources and get jpeg images and video streams through the virtual filesystem. So you can get at your laptop or phone camera over the network without needing to think about how to read image data from those devices. The scanner and Flickr API are filesystems you can copy data from the scanner directly to the web. Or you can grab your data from any other libferris filesystem such as GStreamer or indeed another Flickr API implementing site and copy it to the web.

$ echo  1 | ferris-redirect amarok://control/toggle-play-pause
$ echo 20 | ferris-redirect -T  amarok://control/volume
$ echo 89 | ferris-redirect -T  amarok://control/volume
$ ferrisls -0 amarok://control
pause   0
play    1
toggle-play-pause       1
volume  89
$ ferrisls -0 amarok://playlist
1  2:56  Mark Forry, Yvette ...  Free Software Song
...
$ ferriscp amarok://playlist/1 /media/portableplayer/
$ ferrisls -0 xwin://localhost/clipboard
0       bueller... bueller....
1       examples
...
$ fcat xwin://localhost/clipboard/0
bueller... bueller....
$ fcat gstreamer://capture/lid.jpg | okular -
$ ferriscp scanner://default/gray-full-300/scan.jpg  flickr://me/upload

Authenticate web services

Authentication with web services and other data stores is set up with ferris-capplet-auth. This handles OAuth handshaking with sites providing an implementation of the Flickr API, Facebook, Vimeo, and Google services. This is also where you can setup authentication information to PostgreSQL, Zoneminder, and FerrisREST servers. This way URLs in libferris do not contain user credentials and certainly don’t ever have passwords in them. So the risk of accidentally sharing a URL with authentication information in it is lessened. You can also use ferris-capplet-auth without a GUI as shown below. You will be given the URL to visit in order to authenticate libferris and asked for tokens at appropriate times to complete the handshake. Details on authentication are in a previous linux.com article.

$ ferris-capplet-auth --auth-service gdrive --auth-with-site gdrive
...
$ ferris-capplet-auth --list-auth-sites
facebook
flickr
23hq
pixelpipe
vimeo
google

Alternative access

Sometimes it can be useful to access things that libferris can mount through other systems, sort of like looking at the reverse of everything is a file system. Currently you can access libferris as a KIO slave, through FUSE, as a KDE Plasma data source, as an SQLite virtual table, as a virtual Xerces-C++ DOM, and through XQuery using XQilla. There are also the start of language bindings, mainly targeting Guile and Perl. Having Perl support at the IO::All level is the ultimate goal. Currently support is there to tie to a file handle allowing the below code to create a PDF with the current time in it.

#!/usr/bin/perl
use Time::localtime;
use libferris;
$trunc = 0;
my @options = ();
push(@options,"trunc") if $trunc;
tie( *FERRIS, 'libferris', '>printer://Cups-PDF/foo.pdf', @options );
print FERRIS ctime() , "n";
untie( *FERRIS );

Install libferris

If you’re sufficiently intrigued by the project, go try it out. I have updated my Fedora 20 packages on the openSUSE Build Service which is available at Fedora_20. After downloading home:monkeyiq.repo from the above link to /etc/yum.repos.d/ you should be able to install libferris using the below command. These are the packages I use on my laptop and the build I used to run many of the examples in this article. I also have some packages for Debian ARM hardfloat.

# yum install libferris-suite

One strength of libferris is also one of its weak points; because there is support for mounting many data stores, extracting metadata from many files, and different index implementations, there are other libraries that libferris uses for many of these things which can make building libferris a little daunting for first-time users. Many of the features are optional and will be disabled at configure time if the libraries are not installed on your system. Some lower level dependencies have been split out of libferris itself such as libferrisstreams, stldb4, and fampp which provide some C++ std::iostream support, an interface to Berkeley db4, and file change monitoring respectively. The main things that Libferris needs which you might not have installed are Xerces-C 3, Boost, and development headers for a mime engine (KDE, GNOME, efsd, or libfile). If you want to mount some of the web services then qjson and qoauth need to be installed too.

More to come

In the future I hope to add support for mounting more web services and enhance the web front end to libferris. Mounting web services is more of a challenge than local sources because of throttling, network latency, and the REST API changing on the whim of the company running the site. I will also continue my pursuit to have libferris included in a main stream Linux distribution to help make it simple to install for more people.