Libferris and SQLite–A Powerful Combination, Part 1

328

The libferris virtual filesystem allows you to mount things that are not normally thought of as filesystems. Examples include XML files, Berkeley db4 ISAM files, relational databases, system log files, as well as applications like Firefox, Evolution, XWindow, emacs, and Amarok.

Version 1.3.0 of libferris has enhanced support for SQLite databases. This includes both mounting an SQLite database as a filesystem and exposing any libferris virtual filesystem to SQLite as a virtual table. You can have as many libferris backed virtual tables as you like in SQLite, so you can join information from multiple sources (XML, syslog, db4 files) using SQL queries.

In this article series, I will explain how to mount SQLite as a filesystem so you can manipulate it with the familiar command line tools (ls, cat, redirection), how to go the other way and expose libferris filesystems like mounted XML to SQLite for manipulation. I’ll finish up with a discussion of how libferris lets you mount semi structured plain text files like log files as a filesystem and how you can then expose those to SQLite for analysis.

libferris tries to perform what are traditionally seen as filesystem mounting operations for you automatically. For example, if you list an XML file using ferrisls without using the -d option to list the entry itself rather than its contents, then libferris will try to mount that XML file and show it to you as a filesystem. This allows clients to access and store information into XML and db4 files without needing to know any of the details. A URL could be for a normal directory on an ext3 filesystem or for part of an XML file, it doesn’t matter.

Mounting SQLite databases is now done automatically too. Consider the simple SQLite database shown below:

$ touch   /tmp/test.sqlite
$ sqlite3 /tmp/test.sqlite
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> create table foo ( id int primary key, msg varchar );
sqlite> insert into foo values (4,'hello there');
sqlite> insert into foo values (5,'foo bar');

Without doing anything special, you can get at the data in this SQLite database through any libferris client. If you prefer you can also use FerrisFUSE to expose it through the Linux kernel to any program. For those unfamiliar, FUSE allows a filesystem that appears through the Linux kernel to be served by an application running as a normal application process. This means FerrisFUSE runs like a daemon and can use libferris to store information while making it available to the Linux kernel like any other traditional (ext3) filesystem. All of a sudden you can run vi or emacs to edit a tuple in an SQLite database. The libferris distribution contains clones of the traditional ls, cat, cp, etc. tools which use libferris to perform their work. This way you don’t have to setup any FUSE mounts or the like if you just want to ls, cat or redirect information into a libferris filesystem.

When you mount a table or SQL query result from a relational database with libferris, each column from a tuple is exposed using the standard filesystem Extended Attribute (EA) interface. The EA interface allows key=value pairs to be associated with a file. You can think of EA simply as metadata for files and directories, the familiar lstat(2) information of size, mtime, owner and group of a file are exposed as EA by libferris. In fact, you can get and set the byte contents of a file through the content EA of a file. Making everything available through the EA interface allows you to filter and sort a directory based on lstat metadata as well as user supplied metadata–it’s all just EA.

Shown below, I first list the table from the SQLite database and then view a tuple with the fcat command. When you read a tuple as a file, libferris exposes it as an XML document. In places that do not have a direct mapping, libferris normally tries to do the most sane thing it can. For a tuple, you have to expose all the fields in some delimited format because libferris can’t know what fields are the most important. So using XML makes the most sense because it offers explicit key=value serialization and is easily digestible by other applications. You can also use the -a option to fcat to select a single single column (EA) which avoids the XML.

$ ferrisls -lh  /tmp/test.sqlite/foo
40 4
36 5
$ fcat /tmp/test.sqlite/foo/5

$ fcat -a msg /tmp/test.sqlite/foo/4
hello there

As you can see above, the long listing format from ferrisls only shows the size and name of each file. All of the information shown by ls is exposed as EA by libferris. But there is no modification time, or file ownership privileges stored for a tuple in the SQLite database, so no value is available. This makes the ferrisls long listing seem fairly spartan.

You can also ask a libferris directory what it thinks is the most interesting EA (the recommended-ea) for the user. The ferrisls -0 option uses this feature to let you dig into strange filesystems. Shown below, you can see the msg and id column from the database are shown when using -0 to see the recommended-ea. Because the columns that are shown by recommended-ea can be arbitrary EA, you might like to use the –show-headings option to have ferrisls tell you the name of each EA.

$ ferrisls -0  /tmp/test.sqlite/foo
4 hello there 4 id
5 foo bar 5 id

$ ferrisls -0 --show-headings /tmp/test.sqlite/foo
id msg name primary-key
4 hello there 4 id
5 foo bar 5 id

ferrisls also lets you see its output in XML or RDF/XML using the –xml or –rdf options respectively. Apart from being able to easily parse the output, the XML output lets you see the names of the EA for each value.

$ ferrisls --xml  /tmp/test.sqlite/foo




url="file:///tmp/test.sqlite/foo">





While such read access is a huge step forward for unlocking the data in an SQLite file, you might like to update data as well. Because the redirection commands in bash only know about kernel filesystems, there is the ferris-redirect command which performs a similar role to the shell >, >> and >| family of primitives. The ferris-redirect command reads from standard input and writes that data to a nominated URL. The -T option to ferris-redirect truncates the file before writing to it, and the –ea command is used to select an EA to write to instead of writing to the contents of the file.

$ echo -n "welcome to libferris" 
| ferris-redirect -T --ea msg /tmp/test.sqlite/foo/5
$ ferrisls --xml /tmp/test.sqlite/foo




url="file:///tmp/test.sqlite/foo">

msg="welcome to libferris"
name="5"
primary-key="id"/>



I added support for mounting PostgreSQL database to libferris years ago. There are still some features unique to the PostgreSQL support in libferris such as calling database functions through a virtual filesystem. You can also mount other databases that QtSQL can access using libferris.

In part 2 of this series, we’ll examine doing the opposite: exposing libferris as a relation to SQLite.