Libferris and SQLite–A Powerful Combination, Part 2

198
In Part 1, we discussed how to mount SQLite as a filesystem so you can manipulate it with the familiar command line tools (ls, cat, redirection). Let’s now turn our attention to doing the opposite: exposing libferris as a relation to SQLite. You might have some data in an XML file that you need to join with data stored in Berkeley db4 or an existing SQLite database. Your first thought might be to whip up a Perl script to bring the data together. With libferris you don’t have to write any of this glue code.

As SQLite includes support for virtual tables, you can expose any libferris virtual filesystem as a table to SQLite. Because XML is just another virtual filesystem to libferris you can easily expose it to SQLite and join the data with a normal SQL query.

To use libferris from SQLite, you’ll need an sqlite client binary that can load shared libraries. Allowing binary libraries to be loaded into your SQLite session is a bit of a security issue, the code could potentially do anything. As such, the packaged client for Fedora does not allow loading shared libraries. It is very easy to download the sqlite source and recompile it with shared library support though, and you only have to install the single sqlite3 binary somewhere in order to get at libferris virtual tables.

First, you’ll need to load the libferris support using the .load directive at an sqlite prompt. Data is exposed into SQLite as a virtual table using the create virtual table statement. The format for this statement varies depending on the code implementing the virtual table. For libferris, the parameters are: The URL to expose, a string containing key=value pairs to effect this exposure, and the EA you want exposed, expressed in the normal create table format. Note that you can only specify the name and datatype for each EA (column) you want, you can’t put constraints on them like foreign or primary keys. To avoid confusion, I’ll call the URL supplied as the first parameter to create virtual table the “base URL”.

Issuing the below two statements at an sqlite prompt will bring libferris support into the current session and create a virtual table which shows the /tmp/test directory with various lstat(2) information as well as the MD5 checksum and file contents as columns in the new SQLite table. Note that there is an implicit first column in the created virtual table which is always the file URL (primary key) so you don’t have to specify that manually.

$ sqlite3

.load libferrissqlitevtable.so
create virtual table fs using libferris(
'file:///tmp/test',
'',
name text,
size int,
md5 text,
path text,
mtime int,
atime int,
ctime int,
is-dir int,
mtime-display text,
content text );

There is a short list of common EA descriptions that are available in libferris. If you are unsure what an EA looks like, use the –show-ea option to ferrisls to select which EA you want to see for a URL. –show-ea takes a comma separated list of which EA you want to see. The special ea-names EA will list the names of all EA that are available for a URL as a comma separated string. The second command below generates a newline separated version in a temporary file.

$ ferrisls --show-ea=name,foo URL
$ fls --show-ea=ea-names URL | tr ',' 'n' >|/tmp/o
$ vi /tmp/o

Turning our attention back to the create virtual table SQLite function again. There are currently two parameters which can be specified in the second parameter as comma separated key=value pairs. If you specify recursive=true then the table exposes the entire filesystem recursively from the base URL, otherwise only the directory of the base URL is exposed. This lets you choose if you want the directory or entire filesystem tree from the base URL to be exposed. If you specify add-rowid-to-base-filesystem=true then stable 64bit row-ids are added as needed to files in the exposed table. You need stable row-ids to allow the SQL update .. where and insert into statements to work as expected.

There are a few things to keep in mind when exposing a filesystem as a relational table. For example, EA that are shown in ferrisls and native libferris clients with a ‘-‘ in them are translated to use ‘_’ instead because SQLite doesn’t like the dash. Likewise, ‘:’ is translated into ‘_colon_’. So in the above create virtual table, the is-dir and mtime-display EA will be available in the SQL columns is_dir and mtime_display respectively.

To get at the above virtual fs table, you can use the normal SQL select command:

select is_dir,name,url from fs 
order by url desc;
...
0|df2|file:///tmp/test/df2
0|df1|file:///tmp/test/df1
...

Because some calculated EA like the MD5 checksum are exposed as columns, you can grab those from a query too. The libferris virtual table implementation includes code to allow you to pick files in a where clause using their path or URL and libferris will optimize the lookup in a manner similar to (but not quite as efficiently) as if you had created a relational database index on those columns. Note that if you use the URL in the where clause, you need to specify the file:// part too, and the URL must exactly match what you are looking for. If you are not getting the result you expect when searching for yourURL, use ferrisls –show-ea=url yourURL and make sure the result exactly equals yourURL.

select md5,size,mtime,mtime_display,name,url from fs 
where path = '/tmp/test/df1';

Exposing a normal filesystem directory to SQLite is not very exciting, but keep in mind that anything you can see through libferris you can see through an SQLite virtual table backed by libferris. So if you have an XML file:

$ cat customers.xml 





You can expose it to SQLite and query based on any attributes in the XML file as shown below.

$ cat test-xml.sql 
sqlite> .load libferrissqlitevtable.so
sqlite> create virtual table fs using libferris(
'customers.xml/customers',
'',
name text,
size int,
id int,
givenname text,
familyname text
content text
);

sqlite> select * from fs
order by givenname desc;

file:///ferris/exposures/sqlitevtab/customers.xml/customers/131|131|0|131|Ziggy|Stardust
file:///ferris/exposures/sqlitevtab/customers.xml/customers/3|3|0|3|Foo|Bar
file:///ferris/exposures/sqlitevtab/customers.xml/customers/15|15|0|15|Bobby|McGee

You can also create a filesystem index with libferris and expose desktop search results as a virtual sqlite table. For example, the following searches an image collection for a particular bridge using the EA Query (eaq://) libferris virtual filesystem. For more information on creating and using the desktop search functionality in libferris see this Linux Journal article.

$ cat query.sql
.load libferrissqlitevtable.so
create virtual table fs using libferris(
'eaq://(url=~Akashi-Kaikyo-Bridge)',
'',
name text,
size int,
md5 text,
path text,
mtime int, atime int, ctime int,
is-dir int,
mtime-display text,
width int, height int, content text );

select is_dir,width,height,url from fs
order by url desc;
...
0|1840|1232|file:///digital_pictures/2004-japan/Akashi-Kaikyo-Bridge-main-section2.png
0|1840|1232|file:///digital_pictures/2004-japan/Akashi-Kaikyo-Bridge-main-section.png
...

Updating XML Through SQL Commands

Imagine that you have the customers.xml file shown below. You have exposed it to SQLite and used it to join with information from tables in the database to resolve queries. All is well, but you would like to add a new customer to the XML file. While you might consider opening up a text or XML editor on the XML file itself, you would really like to just keep using SQL.

$ cat customers.xml





Currently you have to specify everything including the URL of the new file you wish to create in the SQL INSERT statement as shown below. Note that the ID does not need to be part of the URL, you only need to tell libferris you want to create a new customer element. A planned addition here is to allow libferris to understand the autoincrement keyword during the creation of the virtual SQLite table. This would let you be able to supply a NULL for the ID column and have one automatically generated for you, even though this is backed by an XML file.

.load libferrissqlitevtable.so
create virtual table fs using libferris(
'file:///tmp/customers.xml/customers',
'add-rowid-to-base-filesystem=true',
id int,
givenname text,
familyname text
);

insert into fs values (
'file:///tmp/customers.xml/customers/customer',
300,
'this is the new given name',
'new sirname here' );

At this stage your customers.xml file will include a new element for the entry you INSERTed above:

$ cat customers.xml



...

givenname="this is the new given name" id="300"/>

The little differences that surfaced for the SQL INSERT statement completely disappear when using SQL UPDATE as shown below:

update fs set familyname = 'Anderson' where id = 300;

As you can see above, setting add-rowid-to-base-filesystem=true in the create virtual table statement causes libferris to save ferris-sqlite-rowid data for files in the virtual filesystem. If these attributes can be saved into the file itself, either directly as in XML, or to filesystem Extended Attributes then they are stored there. If neither of these are possible then libferris stores the ferris-sqlite-rowid data in a personal RDF store.

In Part 3, we’ll discuss how libferris lets you mount semi structured plain text files like log files as a filesystem and how you can then expose those to SQLite for analysis.