Author: Stephen Feller
Chief among the improvements are new replication abilities. Rex Wang, vice president of marketing for Sleepycat, said replication is what allows the software to be scalable and feature high availability. The new in-memory replication allows the software to be faster, and client-to-client replication replaces the use of a master replica. With the new function, replicas can distribute updates to other replicas, he said, which frees up the master for other tasks. Also, if the volume of writes exceeds the capacity of the master, it is possible to partition the data and create several masters to handle separate segments of data.
“It’s the idea of a master database that distributes copies of itself to a bunch of replicas, and therefore can spread the workload and improve scalability,” Wang said. “If any one of the replicas goes down, the other replicas can pick up for it. There’s no loss of service.”
Developers added new online Btree functions, compaction, and disk space reclamation, per customer requests, Wang said. A Btree data tree limits to a range the number of child nodes permitted within an internal node in the data structure. Doing this keeps the number of nodes branching off from the system root low, speeding up access time and efficiency. These functions work while the system is online, he said, eliminating the need to restart a database, which should also speed up and ease the work required to manage the database.
In addition to improving basic functionality, Berkeley DB’s new features include in-memory replication, hot backup utility, delayed client synchronization, and master election speed-ups, Sleepycat said in a press release.
Sleepycat developers’ priority for Berkeley DB 4.4 was on improving performance and availability, Wang said, adding that their aim was to allow the software to continue to function despite hardware or software failures. He added that the online Btree compaction and abandoned lock removal functions were the improvements most requested by customers.
A month ago the company also launched a Developer Zone Web site, offering a central location for downloads and resources to help developers better take advantage of the software.
Building on what Wang said is the company’s belief in a “two-way interaction” with customers and developers, Sleepycat seeks to build a Berkeley DB-specific community for those individuals. He said the Developer Zone will let customers download the product, read documentation, and access other materials such as FAQs, technical articles, and videos, as well as tap the knowledge of their fellow developers through mailing lists and news groups.
“We’re going to continue to enhance it,” Wang said. “Open source companies have a reliance on the community…. We invest a substantial amount of our effort on the community and we’re very grateful in return for what the community gives us.”
Wang said Berkeley DB has been deployed by a variety of vendors — from Internet-based companies like Amazon and Google, to many distributions of server and desktop Linux, to hardware companies such as Cisco, EMC, Ericsson, and Motorola — at least partially because the software operates under a dual-license model. By offering more options for how its software can be used, Sleepycat found a way for open source software to work in areas beyond the open source community, he said.
This dual-license model has allowed Sleepycat to create a cash flow based on licensing for companies who do not want to release source code and other information about their products. By allowing customers to buy a commercial license, closed source applications using Berkeley DB can be distributed.
And the company still offers those who do want to release code the right to use, alter, and redistribute Berkeley DB — keeping the software in the open source realm. The major requirement for the free use license being that any application using the software must be open source if it is distributed. As a result of offering both licenses, all of the companies, applications, and developers who use the software use the same software, Wang said.
According to Rebecca Hansen, who works in Sun Microsystem’s product marketing department, the company does not offer Berkeley DB directly to customers, but rather embeds it in some of its products such as the Java Enterprise Suite, the office software suites OpenOffice.org and StarOffice, and the Solaris Simple Network Management Protocol (SNMP) system monitoring agent.
Hansen said that Sun plans to migrate products that use Berkeley DB to the new version over the course of 2006, with the first expected sometime in April. She said some of the new features were the motivating factor for the migration, including auto lock recovery, system snapshots, and its ability to shrink a database.
While Sun does have software that is open source, not everything it offers is open, and so the company operates using Sleepycat’s commercial license, Hansen said. Buying the commercial license allows Sun to embed Berkeley DB in its applications and not necessarily release the source code.
Wang said part of the reason Berkeley DB has stuck around for more than a decade is because it can work in any system and be invisible to the user, as well as because in some cases it’s a cheaper, faster replacement for similar traditional relational databases.
He said the reason Berkeley DB is used in so many ways has to do with Sleepycat’s effort to make it usable by anybody for anything, including proprietary, commercial uses.
“People have a choice [because of the license option],” Wang said. “They can get their hands on our software very easily — the full code, including source and documentation — and they can get support from us, from a company that’s been around a while.”