CLI Magic: rsync for backups

89

Author: Joe Barr

More than one crusty old geek has suggested that we do a CLI Magic
on using rsync. Ever
attentive to COGs, we’ve finally acted on those suggestions. In
particular, we’ll use it and ssh to do secure backups over a LAN,
just as someone with multiple machines on a LAN at home or in a
small business might want to do. Remember, rsync runs from the
command line, so bust open a console and let’s get started.There are eight different ways to use rsync, and far too much
functionality to cover in a brief column. We’re going to focus on
just one usage of this fine tool: bringing a set of files on one
machine into sync with the same set of files on another. Backing
them up without the need to transfer anything but the differences
between them. It’s a modern-day version of rcp: but it’s faster,
smarter, and more secure.

For our purposes, we’ll assume that there are only two machines
on the LAN. We’ll call the first one desktop and the second server.
As you might expect, we’ll be backing up data from the desktop to
the server. Our goal is to have a recent backup — no more than a
day out of date at any time — so that in the event of a hard drive
failure, a clean install, or simply on a whim, we can quickly and
easily restore a user’s data from the server to the desktop.

First things first

You have to have rsync installed on both machines. Since we are
going to be using ssh to encrypt our rsync communications, it will
need to be installed on both machines as well. Both rsync and ssh
are included in modern Linux distributions, so it shouldn’t be a
problem for you to install them from your distribution’s normal
repository or CD if either one is not already there. If worst comes
to worst, you can always grab the source for these tools here and here, and build them yourself. To
use ssh, you need to have a user account on both systems as
well.

To keep things simple, we’ll backup the contents of
/home/user on the desktop to
/backup/home/user on the server. Here’s how it is
done. The template of the rsync command is:

rsync OPTIONS SOURCE DESTINATION

So, assuming that we will initiate the copy from the desktop
machine, the basic format of our command of the one we will use
after we add some options is this:

rsync /home/user/
username@server:/backup/home/user

Note that user and username will be
the same: the name of your account and the name of the account’s
home directory. The server name in the destination
specification can either be the IP address of the remote machine,
or whatever you call it in /etc/hosts. Also note the
trailing slash at the end of the source specification: that keeps
you from ending up with an additional directory (as in
/backup/home/user/user) on the destination
machine.

What are the options?

Modern rsync uses ssh by default to talk to remote
machine. You can override this default in a couple of ways, but
since the default is our desired mode, we won’t go into that. If
you are using an older version of rsync, you can specify the
transport by using the -e ssh. Otherwise, it’s already taken
care of for you.

Since we are doing a backup, we want all the data in the target directory itself as well as in all its subdirectories. So the first option we need is an r for recursive. There
are a number of other options we want, too. But thanks to the magic of the -a option, we don’t need to specify any of them. The -a option includes the options which make
the copy recursive, include symlinks, preserve permissions,
preserve file times, preserve groups, preserve owner, and preserve
devices.

That’s a lot of magic for a single option. About the only one
you might want for a backup that’s not in that list is the option
to include hard links. If you want that, add H to your
option list. We also want rsync to tell us what is going on as it
works, so we need a v for verbose.

You may or may not want to compress the data you’re backing up
as it is copied. I don’t do this because of the time involved, but
if you want to just add a z option.

One more option and we’re good to go. If you have deleted files
from your desktop, you don’t want them to show back up in the event
you need to restore from the backup. So we need to tell rsync to
delete any files in the destination that no longer exist in the
source. It’s as easy as adding --delete after the
other options.

OK, now our command is ready to run. It looks like this:

rsync -av --delete /home/user/
username@server:/backup/home/user

The first time you run the backup will take the longest, since
all of the data will need to be copied. Thereafter, if you run a
daily or weekly cron job to update the backup, it will fly in
comparison, because only data that has changed or been added will
need to be copied in order to keep the source and the destination
in synch.

Give me my data back!

OK. The worst has come to pass. You need to restore from your
backup. What to do? Not a problem. Remember the opening caveats
about having rsync, ssh, and an account on both machines. Then,
from your desktop machine, simply enter:

rsync -av username@server:/backup/home/user/
/home/user

Now you’ve got everything that was in your home
directory as of your last rsync run. You can also use rsync to do copies from and to the same machine, or across the Internet. There
are dozens and dozens of sophisticated options for you to explore
and tinker with. Take a look at man rsync and you’ll
see what I mean.