Exclusive: Seafile Founder Daniel Pan Talks About His Open Source Cloud Software

677

daniel pan

Cloud has become one of the buzzwords in modern computing; there are so many advantages of cloud that it can’t be ignored. It is becoming an integral part of our IT infrastructure. However cloud poses a serious threat to the ownership of data and raises many privacy-related questions. The best solution is to ‘own’ your cloud, either though an on-premise cloud running in a local network disconnected from the Internet or one running on your own secure server. Seafile is one of the most promising, open source-based cloud projects.

It’s enjoying some traction within organizations in Europe. We reached out to the Seafile team to learn more about the project. Here is an interview with the founder and coordinator of the project Daniel Pan.

Swapnil: How, when and why did the Seafile project start? What was the driving force behind the project?

Daniel Pan: Seafile comes from the idea of easily sharing files among a number of users. It began five years ago in the middle of 2009 when we (Jonathan Xu and me) were still students in Tsinghua University in Beijing, China. Our first attempt was to write a P2P file syncing system where no central server is needed, just like BTSync do.

Later we realised that if we want to add group collaboration features, it is more natural to have servers, which act as a central place for collaboration on files. After graduation, we found jobs and worked for big companies for about two years. In March 2012, we began to take Seafile as our full-time job. We had six people at beginning.

Since 2014, Jackson IT helps the Seafile team maintain the forum for the German community and gives great support for customers. In 2015, we together formed the German company Seafile GmbH to promote Seafile in Germany.

Swapnil: Why did you choose to make Seafile Open Source? What are the advantages of using an Open Source development model?

Daniel Pan: We open-sourced Seafile in July 2012 to bring users and contributions. We have been members of the open source community in Beijing since we were students. So we like to do business in an open source way. Open source helps Seafile to become an international project and brings friends, users and customers worldwide. In China, developing software in an open source way is a trend, especially for people in Internet companies. More and more people are actively participating in global projects like Hadoop and OpenStack.

Swapnil: Can you tell us a bit about the organizational structure of the company?

Daniel Pan: Jonathan and I are co-founders of Seafile Ltd. I work as CEO. He work as CTO. Silja Jackson is now CEO of Seafile GmbH (the German sub-company of Seafile).

Swapnil: How do you compare Seafile with other products like ownCloud? What are the USPs (unique selling points) of Seafile?

Daniel Pan: There are two selling points: 1: Stable and high-performance file syncing. The core function of cloud storage is syncing. In Seafile, we work hard to make file syncing stable and efficient. And it is the hardest part that often can’t be done well. You have to design a correct file storage model and a correct syncing algorithm to not just work for a few hundreds of files, but tens of thousands of files. And you have to support three operating systems (Mac, Windows, Linux). Each is different in many details in how they store and handle files. ownCloud’s desktop file syncing is still in a beta state. Seafile’s syncing is much more stable, though there are still a few small corner cases that we don’t have time to solve yet.

There are at least 10 private cloud products around the world. But only a few can work in critical working scenarios. For example, one of Seafile’s users is a company in Poland. They use Seafile to replace SVN for managing documents for a production design team with 170 people and thousands of files with frequent modifications. SVN works reliably, but not efficiently (you need to checkout files, edit then commit back). Using Seafile improves the work flow, but they are facing new problems like file conflicts. We are working together with them to improve Seafile to meet this heavy and high-concurrent use of Seafile.

2: Easy sharing files into groups and Client-side encryption. In Seafile, files are managed into a collection called libraries. Though this makes Seafile a little more complex than Dropbox. But it makes it easier for users to share files into groups (users can create groups in Seafile and sharing libraries into groups) and also enables the creation of encrypted libraries for client-side encryption.

Swapnil: Recently a university of Rhineland-Palatine chose Seafile, can you tell us about the deployment? Are they doing it independently or will they be working with your teams for deployment and support?

Daniel: It is not a single University but universities of the province in Rhineland-Palatine. They will first deploy Seafile for a single University (Mainz University), then extend the deployment for other universities. Currently they use a shared storage as the backend storage, MariaDB cluster as the database. Seafile and MariaDB run on three machines.

They call us once every few days to communicate problems they face and give us suggestions on how to improve Seafile based on the feedback from the users (students). They have a plan to use Ceph as the file storage backend for Seafile.

Swapnil: In addition to this university are their any other major deployments of Seafile? Can you tell me bit more about them, if any?

Daniel: HU-Berlin university are also under testing for Seafile. They communicate with us via Github. Whenever they encounter a problem/bug, they submit an issue on Github.

Swapnil: You also offer an online cloud service. Can you tell us about the IT infrastructure of the cloud?

Daniel: The cloud service is deployed on Amazon Web Services. We use the MySQL database and S3 storage they offer. It works quite smoothly. And we don’t need to worry too much.

The hardest part for large-scale private cloud is the storage part. We have only two options, OpenStack Swift and Ceph. I don’t have much experience with Swift. Some customers are choosing Ceph, but it is not easy to maintain a Ceph cluster. And we don’t have enough confidence on the stability of Ceph yet. We only use the block storage layer of Ceph, which is production-ready in the sense of code quality. But Ceph is not mature enough in the sense of maintaining, lacking documents and tools. And you need to understand the internal mechanism of Ceph to use it correctly.

Swapnil: Do you use Linux in the back-end? Which distribution do you use and what was the reason behind picking it?

Daniel: We use Ubuntu as our working desktop. Because it is easier to use than other Linux desktops. I used Fedora before. Since 2006, I switched to Ubuntu.

Swapnil: What other open source components do you use for the service and your own infrastructure?

Daniel: For the desktop client, we use SQLite for the database, QT for GUI. For the server, we use Python/Django for a web framework, Backbone.js, JQuery for the browser side, libevent for the framework of backend service, ElasticSearch for search function, LibreOffice and pdf2htmlEx for office file preview.

Swapnil: One major difference between Seafile and other similar open source projects such as RHEL or ownCloud is that the paid version has many features missing from the community version. Doesn’t it affect the potential user-base? How do you justify some secret sauce in the paid version?

Daniel: Big companies like RHEL can get paid by serving big customers from support and contracts while Seafile can’t. Almost all paid customers have less than 50 users. They buy the functionality instead of support. And we don’t have the ability or man-power to support big customers, too. So we have to sell features. If we look carefully at the different features for the pro edition, you will find that they are actually small and nice-to-have features for users. Not critical for most users. The core file syncing/sharing function is no different.

Swapnil: How secure is Seafile in the post-Snowden world? Is user data secure on Seafile servers?

Daniel: Self-hosting your files and not letting others know your server’s address, or protecting it by firewall, makes it very secure. This is the selling point of self-hosted cloud production. In addition, Seafile offers the client-side encryption. Since the code is open source, people review it and make sure we don’t have a wrong design on security and don’t have faults on code regarding security. If they find any, we fix it quickly.

Swapnil: Do you actively interact with other open source projects?

Daniel: We contribute to the libraries we use in Seafile, like libzdb and libevhtp, mostly via Github. We haven’t contributed back to other big projects we use, like Django/QT, yet. (You know, it is not easy to contribute to large projects.)