The issue of open data is on the minds of a lot of technology users and watchers of late, thanks to the media blitzkrieg surrounding the WikiLeaks Web site and the legal battles facing WikiLeaks founder Julian Assange. In the face of such a controversial issue, it’s only natural to ask questions about who owns data and what rights do they have to use and own it?
A more pedestrian — and far less controversial — application of the open data issue is this: who owns the data you put on more everyday Web services, such as Facebook, Yahoo!, or one of the many Google Web applications? The assumption “why me, of course” may actually not be true.
The Problem
This is because data on the various Web services is often locked into the service itself. If Gmail were to go off the air tomorrow, how many millions of users would completely lose not only the ability to communicate presently, but also lose any archived messages? Are messages regularly backed up? In fact, how does one back up Gmail? (Hint: use a standalone POP or IMAP client and pull down the messages to your local machine… and hope you have enough storage capacity.)
And that’s just Gmail: what about all the other Google services? Google’s own solution, the Data Libration Front is a good start, but who has an open source stack equivalent to Google’s to which to move the data? Without such a stack, what good is exporting a Google user’s data out?
Then there is the issue of control, another hallmark of ownership. Who sees the data on Gmail? What processes are looking at those messages in the Inbox, seeing a few occurrences of the word “habit,” and concluding an ad for a twelve-step program would be appropriate to display?
For all the emphasis the open source community has on the freedom surrounding software used, that same community is often willing to sacrifice the same measure of freedom for their data — a far more personal aspect of their online lives.
Open data is a solution to the problem that many users may not recognize they have: how to keep their data accessible and controlled in an environment where that data is increasing online and typically out of control and (at times) inaccessible.
Creating an Open Web
One strong proponent for open data on the Web is Stormy Peters, the former director of the GNOME Foundation and currently head of Mozilla’s developer engagement program. Peters often speaks about the importance of open data, addressing the topic at this year’s OSCON and at the Ohio LinuxFest, where this reporter attended her presentation. Peters cited several examples of services where data can get out of control. Facebook, for example, notoriously holds onto user data rather tightly, to the point where it’s difficult for the users themselves to completely export the data to another service, should they choose. Not to mention Facebook’s ongoing privacy travails.
For the average user, Facebook’s privacy policy has expanded significantly since its beginnings and opting out of individual privacy changes is a byzantine process that is very time consuming. There are some third-party solutions: Matt Pizzimenti founded ReclaimPrivacy.org this Spring as one way to specifically deal with data and privacy in Facebook.
Using ReclaimPrivacy.org is simple; just drag the bookmarklet up to your bookmarks, log into Facebook, and surf to your privacy settings. Once on the page, click the bookmarklet link and off the script will scan for slack privacy settings.
Facebook is just one example in a very wide world of data and privacy problems. Many Web-enabled services used today have a less-than-open data policy in place.
Businesses have a vested interest in open data, as well. Cloud-based services are hugely popular, and vow to protect customer data. But terms of use policies must be closely examined to determine exactly what happens to customer data if the terms of service are violated. What recourse does the customer have? Are their appeal processes in place for dispute? And–most critically–what happens to the customer’s data if the service is cut off? Entire businesses could collapse within days to get data access sorted out during a terms of service dispute.
Peters notes some positive exceptions in her presentations: Identi.ca, the open source microblogging service, not only has open code, but open data policies as well. Similar openness is expected to be found on Diaspora, an open source version of Facebook’s social services now in alpha development.
But will “open” be enough of a draw? Web service users might not want to go through the hassle of moving to more open systems just for privacy’s sake, which will certainly lower customer pressure for open data. And, with so much money involved with mining and keeping customer data, don’t expect openness to magically spread through service providers’ policies without significant customer pressure.