Buoyant’s New Open Source Service Mesh Is Designed with Kubernetes in Mind

376

This article is part of the KubeCon + CloudNativeCon North America 2017 series.

The Linkerd service mesh for microservices was the first in its category and is the most widely used service mesh in production today. It has seen over a trillion requests and has enterprise customers that include Salesforce, FOX, Target, Paypal, Expedia, AOL, Monzo, and IBM.

Today, Buoyant has announced a new, next-gen open source service mesh called Conduit, which was designed to be incredibly fast and lightweight, highly performant, and secure, with real-world Kubernetes and gRPC use cases in mind.

Ahead of CloudNativeCon + KubeCon 2017 to be held this week in Austin, we spoke to George Miranda, Community Director at Buoyant, the maker of Linkerd. Be sure to catch Buoyant CEO William Morgan’s keynote on Conduit at CloudNativeCon. They’ll also be kicking off the conference with the New Stack’s Pancake Breakfast.  Make sure to catch all of Buoyant’s talks at the conference.

Linux.com: What makes managing services more challenging in a Cloud Native environment?

George Miranda: When you’re running monolithic applications on three-tier legacy infrastructure, you make relatively few service requests. It’s pretty obvious where they’re coming from and going to. If things go wrong, you can quickly understand where problems might be happening.

For example, you may be monitoring network performance for packet loss, transmission failures, and bandwidth utilization. You probably use a latency monitoring tool, like smokeping, to get closer to measuring service health, and an in-band tool, like tcpdump, to monitor service communication at the packet level. You triage those metrics along with your event logs and you can infer where things are likely going wrong.

If you’ve managed production applications before, you know this game well, and for the most part these tools did the trick. But they require you to know how the entire system operates in order to make that process work. As a platform operator with monolithic apps, you’ll typically have deep intrinsic knowledge of the services in use, how they interact, and how they operate at that layer for the entire system.

When you start building cloud-native applications, that holistic grasp of the entire system can quickly scale beyond any one platform operator’s reach. You could be managing hundreds or thousands of microservices in your infrastructure. Managing things like load balancing, automated deployments, encryption, cascading system failures, or troubleshooting outages can become incredibly complex without visibility into the service communication layer. That’s where the service mesh can help.

Linux.com: What are the advantages of a service mesh?

George Miranda: A service mesh adds visibility into requests that were once invisible. It turns service communication into a first-class citizen. Essentially, it provides the logic to monitor, manage, and control service requests by default, everywhere, and helps you make your microservices safe, fast, and reliable.

The service mesh is typically implemented as a set of network proxies that are deployed alongside your application code. Those proxies are transparent to your applications, so there are no code changes required to use them. That allows developers to decouple service communication logic from application code. So you can push that into a lower part of the stack where it can be more easily managed globally across your entire infrastructure. You can use that mesh to weave applications deployed between different infrastructure platforms, data centers, and cloud providers into a single fabric. We’ve had customers use the service mesh as a way of reducing lock-in risk and enabling hybrid multi-cloud deployments.

Linux.com: How does a service mesh work?

George Miranda: A service mesh consists of two main parts — a control plane, and a data plane. The data plane is the proxy layer, where service communication is happening. When you, as a user, interact with the service mesh, you interact with the control plane.

The control plane exposes new primitives you can use to control how your services communicate. Those primitives enable you to do tasks you couldn’t before — like having super granular control over managing specific service requests, setting rate limits, managing auth, setting up circuit-breaking logic, distributed tracing, and so forth. You use those primitives to compose service policies on a global or singular level inside the control plane. The data plane then reads policies from the control plane and alters its behavior accordingly.

Linux.com: What has the response to Linkerd been?

George Miranda: It’s been phenomenal. We’ve had trillions requests served by Linkerd by customers in production across a wide range of industries. We have an active community of contributors, open source users, and enterprise customers. We’ve seen the service mesh used in ways we couldn’t have imagined when we first created it.

For example, one of our customers used Linkerd to enable a move to the cloud. They make an ERP platform that obviously contains sensitive customer data. They started modernizing their application stack and made a move to microservices. As with most companies, that meant that their dev teams started to own different parts of what used to be one giant monolith. Some dev teams were great about managing sensitive data, while others did that inconsistently or not at all. When faced with the prospect of moving that data to the cloud, their Information Security team quickly put a stop to those ambitions.

Then they implemented Linkerd. They used the service mesh to decouple the need to manage secure service communication from their development teams. Instead, their dev teams could all configure their apps to make plain HTTP calls to remote services. At the wire level, Linkerd would then do a protocol upgrade to ensure all communication was happening with TLS by default. Suddenly, the platform team could then easily ensure consistency for encrypting data in transit no matter which application was in use. They were able to work with their Information Security team to find a public cloud vendor up to their standards and that’s where they’re running today. That never would have happened for them without Linkerd.

Linux.com: Are there things that catch new users off guard?

George Miranda: There are different ways to deploy the service mesh. Because it’s a series of interconnected proxies, you have options for how that’s set up. Some users prefer having one proxy deployed per physical host or VM that your containers run from. All containerized processes then route traffic through localhost and the service mesh takes it from there. But attaching the proxy to one physical or virtual host can make management more difficult if you’re not always sure where your containerized processes are running.

A common approach these days is to run the service mesh as a container sidecar and not worry about which proxy lives on each container host. The downside is that resource utilization can become a big concern in that pattern. If you have hundreds of containers on any one host, the footprint required for the service mesh suddenly begins to matter.

The service mesh needs to be remarkably small, lightweight, and incredibly fast. You don’t want to have to choose between having resilient services and sacrificing performance. You should barely be able to notice that the service mesh is even there. That’s been one of the drivers behind why we just released Conduit.

Linux.com: With all of the success behind Linkerd, why are you introducing Conduit now?

George Miranda: At Buoyant, we asked ourselves what it would take to build the ideal service mesh from the ground up, but with all the lessons we’d learned from the past 18 months of running a service mesh in production. The answer was Conduit.

Conduit’s rust-based data plane is crazy fast. With sub-millisecond latency and a tiny memory footprint it’s designed to give you the most frequently used benefits of the service mesh without getting in your way. Rust’s memory-safety benefits also help prevent introducing attack vectors that expose your services to additional risk. Conduit is incredibly fast, ultralight, and fundamentally secure. It’s easy to use, easy to get started with, and a great way to manage Kubernetes-based microservices.

Linux.com: Are there any talks in particular to watch out for at CloudNativeCon + KubeCon North America?

George Miranda: The service mesh is all over CloudNativeCon’s agenda. To me, that validates the need for the service mesh as a fundamental building block in the cloud-native stack. KubeCon + CloudNativeCon is a great place to learn more about how the service mesh can help you manage your stack.

We’ll be talking about both Linkerd and Conduit, starting with the Pancake Breakfast on Wednesday morning. For production-grade and multi-platform use cases requiring a feature-rich approach with deep integrations for modern tooling, check out Linkerd and the many customer talks around how it’s used in their stack. For a next gen and ultralight service mesh specific to Kubernetes, check out Conduit. You’ll hear about Conduit in the CNCF keynotes and we’ll dive deep with it in both our SIG and the Linkerd Salon. Check out our schedule and make sure to swing by our booth for demos.