Why Open Source Matters to Alibaba

579

Alibaba has more than 150 open source projects and is a long-time contributor to many others,  Wei Cao, a Senior Staff Engineer at Alibaba, says that sharing knowledge and receiving feedback from the community helps Alibaba refine their projects. We spoke with Wei Cao — who is the head of Alibaba Cloud Database Department and leads the R&D of Alibaba RDS,POLARDB products — to learn more about the company’s open source focus and about some of the database-related projects they contribute to.

Linux.com: Why is open source so important for Alibaba?

Wei Cao: At present, Alibaba has more than 150 open source projects. We work on the open source projects with the aim to contribute to the industry and solve real-life problems. We share our experiences with the rest of the open source enthusiasts.

Wei Cao, Senior Staff Engineer at Alibaba

As a long-time contributor to various other open source projects, Alibaba and Alibaba Cloud have fostered a culture that encourages our teams to voluntarily contribute to various open source projects, either by sharing experiences or helping others to solve problems. Sharing and contributing to the community altogether is in the DNA of Alibaba’s culture.

When we first started to use open sources projects like MySQL, Redis, PostgreSQL, we received a lot of help from the community. Now we would like to give back to the same communities by sharing our accumulated knowledge and receive feedback from the community so that we can refine our projects.

We believe this truly represents the essence of open source development, where everyone can build on each other’s knowledge. We are dedicated to making our technology inclusive through continuously contributing to bug-fixing and patch optimization of different open source projects.

Linux.com: Can you tell us what kind of culture is within Alibaba to encourage its developers to consume and contribute to Open Source project?

Wei Cao: Alibaba has always had a culture of integrity, partnership, sharing and mutual assistance. At the same time, we always believe that more people participating in the community can promote the industry better and also make us more profitable. Therefore, our staff members are willing to pay close attention to open source projects in the community. They keep using open source projects and accumulating experience to give feedback on projects and jointly promote the development of the industry.

Linux.com: Can you tell us what kind of open source projects you are using in your company?

Wei Cao: Our database products use many open source projects such as MySQL, Redis, PostgreSQL, etc. Our teams have done feature and performance enhancement and optimization, depending on various use-cases. We have done compression for IoT and security improvements for financial industries.

Linux.com: Can you tell us about the open source projects that you have created?

Wei Cao: We will be releasing a new open source project, called Mongo-Shake, at the LC3 Conference. Based on MongoDB’s oplog, Mongo-Shake is a universal platform for services.

It reads the Oplog operation logs of a MongoDB cluster and replicates MongoDB data, and subsequently implements specific requirements through operation logs. Logs can provide a lot of scene-based applications.

Through the operation logs, we provide log data subscriptions to consume PUB/SUB functions and can be flexibly connected to adapt to different scenarios (such as log subscription, data center synchronization, Cache asynchronous elimination, etc.) through SDK, Kafka, MetaQ, etc. Cluster data synchronization is a core application scenario. Synchronization is achieved through playback after grabbing oplogs. Its Application Scenario includes:

  • Asynchronous replication of MongoDB data between clusters eliminates the need for double write costs.

  • Mirror backup of MongoDB cluster data. (Not support in this open source version)

  • Log offline analysis.

  • Log subscription.

  • Cache synchronization.

  • Through the results of the log analysis, it is known which caches can be eliminated and which caches can be preloaded to prompt the cache to be updated.

  • Monitor base on log.

Linux.com: Can you tell us about the major open source projects you contribute to?

Wei Cao: We have contributed many database-related open source projects. In addition, we have released open source projects, like AliSQL and ApsaraCache, which are widely used in Alibaba.

AliSQL: AliSQL is a MySQL branch, developed by Alibaba Cloud database team, and is servicing Alibaba’s business and Alibaba Cloud’s RDS. AliSQL version is verified to run many Alibaba workloads and is  widely used within Alibaba cloud. The latest AliSQL also merged many useful

AliSQL does a lot of enhancement in the features and performance based on MySQL. It has more than 300 patches, We have added many monitor indicators, features, and optimized it for different user cases. enhancements from the other branches like Percona, MariaDB, WebScaleSQL, and also contains a lot of patches with Alibaba’s experiences.

In general test cases, AliSQL has 70% performance improvement over official MySQL version, according to R&D team’s sysbench benchmarks. In comparison with MySQL, AliSQL offers:

  • Better support for TokuDB, more monitoring and performance optimization.

  • CPU time statistics for SQL queries.

  • Sequence support.

  • Add Column Dynamically .

  • ThreadPool support. And a lot of Bugfix and performance improvements.

The founder of MySQL/MariaDB, Michael Widenius “Monty” has praised Alibaba for open sourcing AliSQL. We got a lot of help from the open source community in the early development of AliSQL.

Now open source AliSQL is the best contribution we have made to this community. We hope to continue our open source journey in future. Full cooperation with the open source community can make the MySQL/MariaDB ecosystem more robust.

ApsaraCache: ApsaraCache is based on the Redis 4.0, with additional features and performance enhancements. In comparison to Redis, ApsaraCache’s performance is independent of data size. It’s related to scenarios. It also has better performance in cases such as short connections, full memory recovery, and time-consuming instruction execution.

Multi protocol support

ApsaraCache supports both Redis and Memcached protocol with no client code need to be modified. ApsaraCache supports Memcached protocol and users can persist data by using ApsaraCache in Memcached mode just like Redis.

Reusing Redis architecture, we have developed new features of Memcache such as support for persistence, disaster tolerance, backup recovery, slow log audit, information statistics and other functions.

Ready for production

ApsaraCache has proven to be very stable and efficient during 4 years of technical grinding and tens of thousands of practical testing of production environment.

The major improvements in ApsaraCache are:

  • Disaster depth reinforcement refactors the kernel synchronization mechanism to solve the problem of full synchronization of native kernel caused by copy interrupt under weak network condition.

  • Compatible with the Memcached protocol, it supports dual copy of Memcached and offers more reliable Memcached service.

  • In short connection scenario, ApsaraCache makes 30% performance increase compared with the vanilla version.

  • ApsaraCache’s function of thermal upgrade can complete the thermal update of an instance within 3ms and solve the problem of frequent kernel upgrading on users.

  • AOF reinforcement, and solve the problem of Host stability caused by frequent AOF Rewrite.

  • ApsaraCache health detection mechanism.

This article was sponsored by Alibaba and written by Linux.com.