How Service Discovery Works in Containerized Applications Using Docker Swarm Mode

1228

When I first started considering container use in production environments, a question came to mind: When a container may exist across a cluster of servers, how do I get others (people, or applications) to connect to it reliably, no matter where it is in the cluster?

Of course, this problem existed to a degree in the olden days of virtual (or not) machines as well. Back in the Old Days of three-tier webapp stacks, this was handled gracefully by:

·  Load balancers had hard-coded IP addresses for the web servers they are load balancing across

·  Web servers had hard-coded IPs of the application servers they used for application logic

·  Application servers had bespoke, hand-crafted definitions of the databases they queried for data to provide back to the app servers

This was “simple” as long as web, application, or database servers weren’t replaced. If and when they were, the “smarter” of us ensured the new system used the IP address of its predecessor. (See? Simple! Who needs DevOps?) (Bonus points to those who used internal DNS instead of IPs.)

Solutions arose in time. Zookeeper comes to mind, but was far from alone. Service discovery is getting more attention now as complexity has increased: Where before there might have been 10 VMs, there may now be now 200-300 containers, and their lifecycles are significantly shorter than that of a VM.

Following Docker’s “batteries included, but can be replaced” philosophy, Docker Swarm mode comes with a built-in DNS server. This provides users with simple service discovery; if at some point their needs surpass the design goals of the DNS server, they can use third-party discovery services, instead (covered in our next blog post!).

Getting started

There’s plenty of resources available on the Internet discussing installing Docker Swarm mode, so that won’t be repeated here. For this post, I’ll be using a Vagrant configuration that I have forked on GitHub and added some port forwarding to for this post. If you have Vagrant and Virtualbox installed, bringing up a docker swarm mode cluster is as easy as:

$ git clone https://github.com/jlk/docker-swarm-mode-vagrant.git

Cloning into 'docker-swarm-mode-vagrant'...

remote: Counting objects: 23, done.

remote: Total 23 (delta 0), reused 0 (delta 0), pack-reused 23

Unpacking objects: 100% (23/23), done.

$ cd docker-swarm-mode-vagrant/

$ vagrant up

After the last command, take a break to stretch your legs – usually it takes 5-10 minutes for Vagrant to download the Ubuntu VM image, bring up 3 VMs, update packages on each, install Docker and join the VMs to a swarm mode cluster. Once completed, you should be able to ssh into the master node and list members of the swarm with:


$ vagrant ssh node-1

vagrant@node-1:~$ docker node ls

ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS

9f22lo0cthxn64w79arje5rqg    node-2    Ready   Active

p2yg78i4fmzwglu8lp4j1cebc *  node-1    Ready   Active        Leader

tp9h7cpef13fzeztje38igs4s    node-3    Ready   Active

(IDs will be different as they are fairly unique)

Launch a WordPress cluster

Next, let’s launch a WordPress cluster of two WordPress containers backed by a MariaDB database. To make this easy, I’ve created another GitHub project containing a docker-compose file to build the cluster. Let’s clone the project and bring up the containers:


vagrant@node-1:~$ git clone http://github.com/jlk/wordpress-swarm.git

Cloning into 'wordpress-swarm'...

remote: Counting objects: 7, done.

remote: Compressing objects: 100% (6/6), done.

remote: Total 7 (delta 0), reused 4 (delta 0), pack-reused 0

Unpacking objects: 100% (7/7), done.

Checking connectivity... done.

vagrant@node-1:~$ cd wordpress-swarm

vagrant@node-1:~/wordpress-swarm$ docker stack deploy --compose-file docker-stack.yml wordpress

Creating network wordpress_common

Creating service wordpress_wordpress

Creating service wordpress_dbcluster

vagrant@node-1:~/wordpress-swarm$

In the background, Docker is scheduling those containers to run across the swarm, downloading images, and spinning up the containers. Depending on your computer and network speeds, after about a minute you should be able to see the services running:


vagrant@node-1:~/wordpress-swarm$ docker service ls

ID            NAME                 MODE        REPLICAS  IMAGE

fyhqrei7hz75  wordpress_dbcluster  replicated  1/1       toughiq/mariadb-cluster:latest

ojbyktsyrmla  wordpress_wordpress  replicated  2/2       wordpress:php7.1-apache

vagrant@node-1:~/wordpress-swarm$

At this point, you should be able to load http://localhost:8080 in a browser and see the initial WordPress configuration screen.

Take a look at the docker-stack.yml file. You’ll see environment variables passed to the WordPress containers instructing them to connect to a MariaDB database with a hostname of dbcluster – the name listed for the database service. It’s just a string passed into the container, there’s no defined link between the two services. In older versions of this demo, we would have had to create a “link” between the wordpress and dbcluster services in the docker-stack.yml file in order for the wordpress containers to be able to recognize and use the dbcluster hostname. This would have looked like:


    services:

      wordpress:

        ...

        links:

          - dbcluster

Instead, what’s happening here is after Docker creates the dbcluster container, it automatically publishes an A record in its DNS service to allow other containers to find it when they perform a DNS name lookup. If you look at the beginning logs of one of the WordPress containers, you may be able to see at first it’s unable to find the dbcluster host, then after a few tries it gets it. Then the connection is refused while mariadb is starting up. WordPress keeps attempting to establish a database connection, and once the db is up, it connects and we’re up and running:


Warning: mysqli::__construct(): php_network_getaddresses: getaddrinfo failed: Name or service not known in - on line 22

Warning: mysqli::__construct(): (HY000/2002): php_network_getaddresses: getaddrinfo failed: Name or service not known in - on line 22

MySQL Connection Error: (2002) php_network_getaddresses: getaddrinfo failed: Name or service not known

Warning: mysqli::__construct(): (HY000/2002): Connection refused in - on line 22

MySQL Connection Error: (2002) Connection refused



AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.0.0.3. Set the 'ServerName' directive globally to suppress this message

AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.0.0.3. Set the 'ServerName' directive globally to suppress this message

[Mon Feb 27 16:16:43.086761 2017] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/7.1.2 configured -- resuming normal operations

[Mon Feb 27 16:16:43.086836 2017] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

Scaling the Database

Next, let’s try scaling up the database and see what happens. The database container image I picked to use here has been built with MariaDB’s Galera clustering enabled, configured to discover cluster members via multicast. While DNS-based service discovery is built into Docker Swarm Mode, the more complex process of multi-master replication is still left for an application to figure out, thus the Galera clustering functionality is required. Let’s scale the database cluster up to three nodes:

vagrant@node-1:~/wordpress-swarm$ docker service scale wordpress_dbcluster=3
wordpress_dbcluster scaled to 3

Docker spins up two more containers and adds them to a load-balanced pool behind the dbcluster virtual IP address. The containers start, discover each other via multicast, sync up and once you see the message below in their logs (after about 30 seconds), we have a three-node db cluster:

2017-02-27 16:49:42 139688903960320 [Note] WSREP: 
  Member 2.0 (84e5bc4c66b9) synced with group.

Try loading the WordPress site again in your browser – it should still work! At this point, when the WordPress containers attempt to connect to dbcluster, the request is load-balanced by Docker across the three dbcluster containers. The IP address for dbcluster which is published in Docker’s DNS is a “virtual IP,” and behind the scenes Docker load balances traffic to the cluster members using IPVS. If multi-master replication was not synchronized, this would cause significant confusion, if it worked at all.

Finally, let’s scale the database cluster back to a single node. While I’d want to be very, very certain of what I was doing (and the state of my backups) before trying this in production, for this demo we can be carefree and try:

vagrant@node-1:~/wordpress-swarm$ docker service scale wordpress_dbcluster=1

wordpress_dbcluster scaled to 1

vagrant@node-1:~/wordpress-swarm$

With that, two containers will be gracefully shut down, the mariadb cluster returns to a size of one, and WordPress should still be running happily.

Docker Swarm mode service discovery works quite well, and helps us to loosely define relationships between parts of an application. There’s limitations to what it can do, though – we’ll cover those in future posts.

Learn more about container networking at Open Networking Summit 2017. Linux.com readers can register now with code LINUXRD5 for 5% off the attendee registration.

John Kinsella has long been active in open source projects – first using Linux in 1992, recently as a member of the PMC and security team for Apache CloudStack, and now active in the container community. He enjoys mentoring and advising people in the information security and startup communities. At the beginning of 2016 he co-founded Layered Insight, a container security startup based in Silicon Valley where he is the CTO. His nearly 20-year professional background includes datacenter, security and network operations, software development, and consulting.