Wrangling Data Vizualization with Gnuplot 4.6

596

Let’s face it: everybody likes having data, but nobody likes staring at a column of numbers. If you cannot sculpt your raw data into a visualization that either illuminates the problem or helps you find the solution, then you’re only halfway done. Luckily there are tools like Gnuplot available, which allow you to manually or automatically generate high-quality visual representations of your data sets. The new release, 4.6, adds some important features from a mathematical standpoint — and just as importantly, many updates to the output framework, including support for generating interactive HTML5 displays.

Star Chart-er

At its core, Gnuplot is an engine for streaming in data, massaging it, and producing visual output. The input portion of the process can be interactive, read from a file, or generated by another program — that is why Gnuplot is often used as the graphing engine for other data-centric applications. Examples include the GRASS geographic information system and the Octave numerical computation framework, as well as third-party tools like Pig, the data-analysis platform for Apache Hadoop, or the Puppet configuration-management tool.

Gnuplot’s output is handled by a user-selectable export module called a terminal driver in Gnuplot lingo. It can directly generate image files as output, directly open up a new window in the operating system, or format the output for use in another application (including high-caliber packages like LaTeX). The combined flexibility of the input and output options is what makes Gnuplot such a long-standing favorite for visualization — the project has been in continual development since the 1980s, and adds new features with every release.

You can download the latest 4.6 release from the project’s home page. There are source code bundles for Gnuplot in actionLinux at present; all major Linux distributions package Gnuplot, so updated packages should be available shortly from your distribution’s package management system. An interesting side-effect of Gnuplot’s long development history is that the package actually pre-dates the GPL and most other standardized FOSS licenses by several years (as a matter of fact, it is not part of the GNU project at all — the similarity in names is coincidental). The Gnuplot license is very similar to the unrestrictive copyleft licenses most of us are familiar with; the only difference being that it is harder to take pieces of the Gnuplot source and patch them into another product.

The current maintainers are well aware of the headaches this causes, and it may change in the future if they can work out the details necessary for relicensing, but for now be sure to read the license carefully if you want to incorporate Gnuplot code into another project. Note, however, that this has absolutely no effect on using or deploying Gnuplot in production; only on creating new, derivative software.

New: Structured Functions, Drivers, and Statistics

The two biggest features debuting in 4.6 are support for structured blocks of code in the scripts that define a Gnuplot process, and a set of new terminal drivers implementing the latest-and-greatest of output options. The structured code blocks allow Gnuplot to process iterative “for” loops or if-then-else logical constructs with compound, multi-line blocks of code in each section. This is a feature that users have asked for for ages, since without they have always needed to wrap Gnuplot directives inside Perl, Python, or some other language. For an example of what this looks like, a Gnuplot script might iterate through the Leibniz formula for calculating π/4 to varying degrees of precision:

 

set multiplot layout 2,2
leibniz(k) = ((-1)**k)/(2*k+1)
do for [ power = 1:4] {
   TERMS = 10**power
   set title sprintf("%g term summation",TERMS)
   plot sum [k=1:TERMS] leibniz(k) notitle
}
unset multiplot

 

The new terminal driver options offer you a variety of things to do with the output. There is a new Qt terminal, a Lua terminal, and a terminal for the ConTeXt macro package, as well as a pair of terminals using the Cairo graphics engine — one that produces EPS output, and one that produces LaTeX. The EPS and TeX options are geared towards print users, but the others allow developers to build Gnuplot graphing into other applications. The Qt driver, for instance, can pop up a new, stand-alone Qt window, but it can just as easily construct a Qt canvas to be managed by an application.

Gnuplot can draw several new styles of visualization, such as circle graphs where the radius of the circle shows the magnitude of the data, filled “step” plots (akin to histograms), and boxplots, which summarize statistical metrics like the mean, standard deviation, and quartile boundaries. For statistical work, there is also a new one-word command, stats, that calculates and generates a statistical summary of the data set, with options to control listing averages, sums, quartiles, and other calculations.

There are also several new output directives. Users can define their own line-styles and custom color sequences as save them as defaults. That way, whenever a Gnuplot script generates a graph, you can ensure that the same color scheme is used (which would be particularly valuable, for example, in a web-based data mining tool, where you do not know in advance what data you will be graphing, but you would like to establish a consistent look to the output).

Improvements: Time, Encoding, Smoothing, and HTML5

Many other existing functions in Gnuplot have been revised in this release. Time formats can go down to microsecond precision, polar coordinate plots can be drawn in a variety of new styles and the axes customized just like the existing rectangular plots, and many more multi-byte text encodings are supported, such as UTF-8 and the Shift JIS encoding used for Japanese.

Gnuplot is available for a number of platforms, and 4.6 introduces several improvements to the Windows builds to bring it closer to par for Linux and other Unix-like systems: a GUI front-end, a real application installer, and built-in help. The Windows terminal driver has been modernized, too, supporting output with alpha transparency, anti-aliasing, and other graphics features found on other platforms already.

Every platform benefits from new curve-smoothing algorithms, which draw smoother 2D and 3D forms fitted to the data points. But arguably the most important enhancement comes to the two HTML5 terminal driver outputs: the HTML <canvas> driver and the SVG driver. The canvas driver creates JavaScript-driven plots, while the SVG driver creates standards-compliant, XML-based SVG vector images. The capabilities are roughly equal; which you choose depends on the output mechanism you desire. But both drivers make use of dynamic HTML5 elements to create interactive graph objects, in which your site visitor can zoom in and out of the chart, pan around, and toggle the visibility of labels, axes, and individual data series at will.

For an example of this type of output, consider interactive stock or weather web sites; most of the time, you want visitors to be able to adjust the parameters of the visualization and see the results quickly. The improved HTML canvas and SVG drivers effectively make Gnuplot capable of generating interactive web content like you might expect to find on a full-blown Ruby- or Python-based web application — but it does so without the overhead of an application framework.

The Plot Thickens

Gnuplot is an increasingly rare sample among open source projects, because it is equally as popular as a stand-alone tool (for running analysis on data sets collected separately) and as a library-like utility for other data-driven applications. What that tells you is that good visualizations speak louder than words. Whether you are benchmarking a new filesystem one time, charting cash flow on a regular basis, or generating interactive HTML5 plots while you mine data, Gnuplot can get the job done.

The new features, particularly structured code blocks, are only going to expand what people can do with the system. The update is new enough that we are only beginning to see practical uses for the expanded capabilities, but one thing is certain — we will see them in a wide range of applications.