Automating Infrastructure Deployment for Kubernetes

422

Many organizations run Kubernetes clusters in a single public cloud, such as GCE or AWS, so they have reasonably homogenous infrastructure needs, says Alena Prokharchyk, Principal Software Engineer at Rancher Labs.  In these situations, deploying Kubernetes clusters is relatively straightforward. Other organizations, however, may need to deploy Kubernetes across multiple clouds and data centers, which can lead to challenges.

Prokharchyk, who will be speaking along with Brian Scott of The Walt Disney Company at KubeCon in Seattle, shared more about these challenges and how Rancher Labs has worked with various organizations to solve them.

Alena Prokharchyk, Principal Software Engineer at Rancher Labs

Linux.com:  Are there any challenges when deploying Kubernetes clusters within an organization with diverse infrastructure?

Alena Prokharchyk: While Kubernetes is designed to run on diverse infrastructure, organizations still face the challenge of preparing each of these infrastructure environments in different ways. Setting up the etcd cluster, starting a Kubernetes master and kubelets, configuring various storage and networking drivers, and setting up a load balancer often require different scripts and steps for different infrastructure environments.

As we’ll discuss at KubeCon, we address these challenges by creating a common set of infrastructure services (networking, storage, and load balancer) across diverse public clouds, private clouds, virtualization clusters, and bare metal servers. From there, a common set of tools based on Rancher can be used to automate the setup, ongoing management, and upgrade of heterogeneous Kubernetes clusters. Introducing a new declarative configuration language to solve this problem is something we tried to avoid, as it would have been another learning step for system administrators.  

On Rancher, we also decided to containerize the entire Kubernetes cluster deployment, and to orchestrate those deployments. This approach allows users to describe the application itself, as well as the dependencies between different services. It also makes it simple to scale the cluster as new resources are added.

Linux.com:  Are there any best practices for automating the deployment of multiple Kubernetes clusters?

Alena: There are a couple of ways to do this. Kubernetes now ships with a rich set of cloud provider support that enables easy setup of Kubernetes clusters. There is also an increasing number of tools (such as the kubeadm tool in 1.4) that automate the deployment of Kubernetes clusters. However, we still lack tools that can fully automate both the deployment of Kubernetes and the infrastructure elements on which Kubernetes relies. The industry has not yet established a set of best practices to deploy multiple Kubernetes clusters. In our talk, we will show how we might be able to accomplish this using the Rancher container management software.

Managing infrastructure is just as important as managing Kubernetes deployments. It is critical to provide an easy way of adding and removing hosts, to provide an overlay network and DNS, and to detect hosts failures – all that is necessary to ensure a smoothly running Kubernetes cluster. This part should always be automated first.

Lastly, protecting your data is always important, and we advise users to pay extra attention to etcd, HA, and disaster recovery; automating this process always pays off. For many enterprises, even large ones, losing etcd quorum is not uncommon – we advise periodically backing up etcd clusters so they can be easily restored and recovered after losing quorum.

Linux.com:  What can organizations, either large and small, do to simplify Kubernetes deployments?

Alena: Teams need an easy way to both deploy and upgrade Kubernetes clusters. It should only take one click for the user to upgrade his or her Kubernetes deployment; distributing the latest templates, and notifying users that their clusters are due for updates are initial steps organizations can take to simplify the process.  

Linux.com:  What makes deploying Kubernetes clusters relatively straightforward for enterprises running them in a single public cloud like GCE or AWS?

Alena: Native support for Kubernetes on GCE and AWS is very good. Services like GKE make running Kubernetes on Google Cloud even easier. We actually encourage users to use these tools when they are only interested in running Kubernetes in a single public cloud, as they’re built natively to work with that cloud.  If your cloud (and Kubernetes cluster) is homogenous, you can leverage provider-specific functionality for features like load balancing and persistent storage.

But in our experience, enterprise users are interested in running Kubernetes on multiple public clouds, or on mixed infrastructure; if you want to build a cluster of GCE and AWS instances, AWS ELB or EBS features won’t be available for GCE. With Kubernetes on Rancher, we offer an alternative solution for that – Rancher Load Balancer. Its implementation allows users to balance traffic across clouds, and allows them to choose a load balancing provider among choices like HAproxy, Nginx, or traefik.

Linux.com:  What have been your biggest learnings when working with enterprise IT organizations to solve Kubernetes deployment problems?

Alena: For enterprise IT organizations, managing access control to the Kubernetes cluster is incredibly important; providing a variety of options for managing access control is advisable, as most organizations want to integrate with the solutions they already use. Rancher integrates with ActiveDirectory, AzureAD, GitHub, Local Authentication, and OpenLDAP, and we are planning to add more.

With large-scale Kubernetes clusters, we find that users encounter node and networking failures fairly frequently. As a result, when it comes to defining Kubernetes cluster system services, we include a monitoring option. Furthermore, when such failures occur, Rancher implements self-healing measures to automatically keep the Kubernetes cluster running as expected; those self-healing measures are just as important as automating the deployment of the cluster itself.  

Registration for this event is sold out, but you can still watch the keynotes via livestream and catch the session recordings on CNCF’s YouTube channel. Sign up for the livestream now.