Redundant routers with Linux and Keepalived

2719

Author: JT Smith

A router is typically a single point of failure. If all your clients connect to the Internet through one router, any failure of that system will result in the loss of all Internet connectivity. There are various solutions to this problem, but one of the easiest to set up is Keepalived, a daemon that uses the Virtual Router Redundancy Protocol (VRRP) to allow two or more redundant routers to present a virtual router IP address to the systems on your network.

To understand how you can use KeepAlived, imagine that you are a system administrator in charge of a group of systems. Since you are the careful type, you do everything as redundantly as possible. You use RAID on your file servers to guard against drive failures. You back up your data to tape regularly. You stock spare parts for your systems. However, you know that a glaring single point of failure exists on your network: the router. That one system sits between all those workstations and the outside world. If it fails, your users lose all connectivity to the Internet and other networks in your organization. What can you do to guard against this?

The answer is multiple redundant routers.

At first glance this solution isn’t so simple, because standard client computers (whether running Windows, Linux, or something else) generally don’t run routing protocols. That is, they don’t understand how to deal with multiple routers. Instead, they are configured with one gateway router address. Any traffic that isn’t destined for the local network just gets forwarded on to the router.

This setup is simple and reliable. However, it leaves no room for redundancy because the systems on your network will continue to blindly attempt to use your router even if it is down.

The solution

Fortunately, free software and open standards provide an answer to this problem in the form of Keepalived and the VRRP protocol. VRRP allows multiple routers to monitor each other and act as one virtual router.

One router is designated as the master and controls a virtual router IP address in addition to its regular IP address. The other router is a slave and does nothing but watch the master. If the master stops responding, the slave takes over the virtual router IP address. The slave continues to look for the master and, when it comes back, the master once again takes over the virtual address.

VRRP is supported by most modern commercial routers. Since VRRP is an IETF specification, anyone can implement it, and an implementation for Linux comes as part of the Keepalived package. Thus, you can create a redundant router setup with two Linux servers each running Keepalived.

Although Keepalived is targeted primarily at keeping Web server farms and the like running, it works perfectly fine for VRRP. You can just ignore all the other features of Keepalived.

Installation and configuration

You can download and build Keepalived in the usual fashion. Alternatively, there may be a prepackaged Keepalived for your Linux distribution. I encourage you to check out the latest version on keepalived.org, because Keepalived has been under active development recently.

The actual operation of Keepalived is controlled by the configuration file (by default /etc/keepalived/keepalived.conf). This is where you designate which router is the master and which is the slave.

Here’s a minimal sample master keepalived.conf file:

vrrp_instance VI_1 {
 state MASTER
 interface eth0
 virtual_router_id 1
 priority 100
 authentication {
  auth_type PASS
  auth_pass password
 }
 virtual_ipaddress {
  192.168.1.1/24 brd 192.168.1.255 dev eth0
 }
}

And here’s a sample slave keepalived.conf file:

vrrp_instance VI_1 {
 state BACKUP
 interface eth0
 virtual_router_id 1
 priority 50
 authentication {
  auth_type PASS
  auth_pass password
 }
 virtual_ipaddress {
  192.168.1.1/24 brd 192.168.1.255 dev eth0
}

There are many other keepalived.conf configuration options, but these settings are sufficient to get you started. You can, of course, check the keepalived.conf man page for the rest of the options. Both the master and slave must have the same virtual_router_id. The lower priority in the slave keepalived.conf indicates that it should defer to the master (that is, whenever the master is alive, the master will advertise the virtual IP address). The master priority should be 50 higher than any slave priority to ensure it is always automatically master.

The state setting in each keepalived.conf file indicates what state Keepalived should start up in. You want the master to be in state MASTER and the slave in state BACKUP.

Note that you can have multiple slave routers. In that case, you set each slave to similar priorities (say 48,49,50 for above example). The highest one will take over as master if necessary. If that one fails as well, the next highest takes over, and so on.

The only configuration necessary on your client systems is to ensure they use the virtual IP address as their gateway, and not the real address of the router. If the clients already point at the master router, it may be easiest to move its real IP address and set the virtual IP address to that of the old master router address. That way, you avoid having to change the settings on each client.

Note that I’m neglecting the issue of what happens beyond the routers in terms of redundancy. Obviously, if you have your only external connection plugged in to only your master router, you will lose that when it goes out. You will have to do some further work to set up redundancy on the other side of the routers.

Testing

After you have configured Keepalived, start it on all your routers by running the init script /etc/init.d/keepalived. You should see messages in /var/log/messages on each machine indicating which is the master and which is the backup.

Let’s assume your master has the real IP address of 192.168.1.250 and your slave is 192.168.1.251> (remember the virtual IP address is 192.168.1.1). Run ip addr list eth0 on each machine, assuming your Ethernet device is eth0. On the master you should see something like this:

2: eth0:  mtu 1500 qdisc noqueue
  link/ether 00:e0:81:2b:aa:b5 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.250/24 brd 192.168.1.255 scope global eth0
  inet 192.168.1.1/24 brd 192.168.1.255 scope global secondary eth0

and the output on the slave should be something like:

2: eth0:  mtu 1500 qdisc noqueue
  link/ether 00:e0:81:2b:aa:c3 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.251/24 brd 192.168.1.255 scope global eth0

These entries indicate that the master router is controlling both its address and the virtual IP address.

To test failover, unplug the Ethernet cable from the master router. You should see log messages on the slave system indicating it has lost track of the master and has become the master. You won’t see any messages in the master syslog until you reconnect it to the network, because it doesn’t realize it is no longer the master. When you reconnect the master, the slave will hand control of the virtual IP address back to the master.

So what happens on the clients when the master router fails? VRRP does not handle connection migration, so you will lose any open connections. For example, any SSH sessions going on through the master router will die. In practice, this is acceptable because most client machines are using the network for things like Web browsing, which open a separate connection each time. All the users will see is a Web page that fails to load or perhaps is interrupted partway through. They will click on their reload buttons and everything will be fine.

Conclusion

It’s actually pretty simple to set up multiple redundant routers on your network using Keepalived and VRRP. The configuration details can be a bit confusing, but that is largely due to Keepalived having so many other features beyond VRRP support. However, if you follow the sample config files above, you will have Keepalived running in short order and your network will be more robust.