Setting up an ARM Based Cassandra Cluster with Beagle Bone Black

1413

A great project to try on cheap ARM boards such as the BeagleBone cluster is to set up a database cluster. For a developer, this is useful both for storage for projects and to gain experience administrating NoSQL databases with replication. Using my existing three BeagleBone Black cluster I decided to try using the Cassandra database. Cassandra is easy to use for those already familiar with SQL databases and is freely available. All of these steps were done on an Ubuntu install and should work on any ARM board running an Ubuntu based OS.

To get started, you need the Cassandra binaries. I was unable to find a repository that had an arm version of Cassandra so I downloaded straight from the apache site and untared it. You can go to http://cassandra.apache.org/download/ and use wget to get the gzip file from one of the mirrors. The version I downloaded was apache-cassandra-2.0.2-bin.tar.gz. Once you have it on each machine, place it in a directory you want to use for its home, for example /app/cassandra and then unzip it:

tar -xvzf apache-cassandra-2.0.2-bin.tar.gz

Now you have everything you need to configure and run cassandra. To run in a cluster, we need to set up some basic properties on each machine so they know how to join the cluster. Start by navigating to the conf directory inside the folder you just extracted and open the cassandra.yaml file for editing. First find listen_address and set it to the ip or name of the current machine. For example for the first machine in my cluster:

listen_address: 192.168.1.51

Then do the same for rpc_address:

rpc_address: 192.168.1.51

Finally we list all of our ips in the seeds section. As I have three nodes in my cluster, the setting on each machine looked like this:

- seeds: "192.168.1.51,192.168.1.52,192.168.1.53"

Additionally if you want to give your cluster a specific name, you can set the cluster_name property. Once all three machines are set up as you like them, you can start up cassandra by going to the bin directory on each and running:

sudo ./cassandra

Using Cassandra

Once all three nodes are running, we can test on one of the nodes with cqlsh and create a database. Cqlsh is a command line program for using Cassandra’s SQL like language called CQL. I connected to the utility from my first node like this:

./cqlsh 192.168.1.51

The first step is to create a keyspace which acts like a schema in a SQL database like Oracle. Keyspaces store columnsets which act like tables and store data with like columns:

>create keyspace testspace with replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 };
>use testspace;

Now we can create our column set and add some rows to it. It looks just like creating a database table:

>create table machines (id int primary key, name text);
>insert into machines values (1, 'beaglebone1');
>insert into machines values (2, 'beaglebone2');
> insert into machines values (3, 'beaglebone3');
>select * from machines;

Now we have a simple set of columns with some rows of data. We can check that replication is working by logging in from a different node and performing the same selection:

./cqlsh 192.168.1.52
>use testspace;
>select * from machines;

So now we have a working arm based database cluster. There is a lot more you can do with Cassandra and some good documentation can be found here. The biggest issue with using the BeagleBone Black for this is the speed of the reads and writes. The SD card is definitely not ideal for real time applications or anything needing performance but this tutorial is certainly applicable for faster arm machines like the Cubieboard and of course desktop clusters as well.