We regular computer users depend so much on digital documents that it would be crazy not to make our best to make sure that we will never lose them, no matter what happens.
The first part of the solution, of course, is to only save files in open, standard formats which give the greatest possible guarantees to remain readable in the future with any software. The other, which is the subject of a three part miniseries starting today, is to always have backup copies of each file.
In this article we’ll begin with some essential criteria to follow when planning backups. Then we’ll show a very simple backup script. It may not be glamorous, but will be surely work on any distribution or Desktop Environment you may use in the foreseable future.
Basic Criteria
The first rule of backups is to store them in a separate place. If you store both the originals and the backups on the same computer, one hardware failure is all you need to lose everything for good: always copy on external drives or somewhere online your backups!
The second things to sort out is to figure out exactly what you need to save and how often you must do it. Backing up everything, every day, would be just a big waste of time and space.
Binary programs, for example, should just be reinstalled from the original packages or source code. For log files, instead, install logrotate.
System configuration files are usually stored in the /etc/ directory. User configuration files, instead, are hidden something into the respective $HOME directories. All these files are of little size but only change after major changes to your environment (computer, distribution, or ISP) or if you replace some major software component, like Postfix with Exim or KDE with Gnome. Consequently, you can either waste a little bit of space (re)copying them at every backup, or only save copies manually when you actually modify them.
Relational databases like MySQl or PostGreSql, which are also used as backends by several desktop applications, are more complicated. Backup-wise, they look like bundles of binary files which a server modifies at random intervals. You can’t just make a direct copy of those files and be sure they’ll be usable. You must tell the server itself, instead, to dump all its data in ASCII files which you can then back up like everything else: the corresponding simple procedures are explained in separate articles, one for MySQl and one for PostGreSql backups.
What about email? what we see in our email client window as a “mailbox” containing many messages can be on disk either one single file (mbox format) or a directory where each message is into a separate file (Maildir and similar). Backup-wise, the second solution is better , at least for mailboxes which change frequently, because you can treat single messages as any other user file, from spreadsheets to movies or pictures galleries.
Automated Backups
User files should be backed up often! The simplest and most efficient way to perform automated backups, at least for beginners, is to combine full backups, made every months or week, with daily incremental ones. The latter only archives files changed or added in the last 24 hours or since the last full backups, saving a lot of disk space.
Simple backups methods concatenate all files you need to save in one big archive, both for simplicity and portability. File permissions, symbolic links or file names which are very long or contain unusual, non ASCII characters may not be preserved on every file system, including those on DVDs or FAT-formatted USB keys. Single archive files with short, simple names, instead, can be copied everywhere and still preserve all the properties of each file it contains.
Should you compress and/or encrypt those archives? Maybe, maybe not. If most of the archived files are in already compressed formats, from MP3 to OpenDocument, you aren’t going to gain much space with further compression. Besides, if a compressed archive is corrupted, it may become impossible to uncompress it. In such a case, you would lost all the files it contained. If some part of tar file is corrupted, instead, the rest is still recoverable. Encryption has all the same risks, plus the one of forgetting the passwords. It probably makes more sense to find out what your really private files are, and encrypt each of them separately.
Tar Basics
The best compromise between simplicity and flexibility for backups on Gnu/Linux systems is the GNU version of the tar (Tape ARchiver) program, which also comes with an extensive online manual.
Tar can create archives from scratch or append new files to existing ones. It is also able to find out which of the files contained into an archive where modified after its creation, and replace them with their newer versions. The “t” and “x” options list and extract the content of the archive file passed through the “f” option:
tar tf somearchive.tar # list content of somearchive.tar
tar xf somearchive.tar # extract all files from somearchive.tar
tar x my_resume.odt f somearchive.tar # extract my_resume.odt from somearchive.tar
Tar has options to only archive the files changed after a given date or those listed into a text file. The second method is much more flexible than the first, even when you only want to archive newer files. The reason is that you can generate the file list by hand or automatically, with any combination of filtering criteria which can be coded into a script.
A Practical Example: Combining Full and Incremental Backups
The Bash script of Listing 1 can perform both full and incremental backups, for all the users of a GNU/Linux box. The backups are stored in the $TARGETDIR directory, inside a sub-folder named after the current day. Each user (or the administrator) lists in separate files all and only the files or folders that need backup. Configuration files, mailboxes and other documents are backed up separately, for each of the users listed in the $USERS (cfr lines 6 and 7). When the script is launched without arguments, tar is used as in line 24. The “cf” options create an archive named $TARGETDIR/$DATE/$USER$LIST.tar; -W issues a warning if something went wrong and -T gives the name of the file list to use. If you run the script with the “incr” option (cfr line 13), instead, the script creates other file lists on the spot, through the find command in line 17: “-mtime -1” means “find all the files created or modified no more than one day ago”.
1 #! /bin/bash
2 # Basic full/incremental backup script
3
4 TARGETDIR=/mydata/backups
5 DATE=`date +'%Y%m%d'`
6 FILE_LISTS='docs email config'
7 USERS='marco fabrizio'
8
9 cd /
10 mkdir $TARGETDIR/$DATE || exit
11 for USER in `echo $USERS`
12 do
13 if [ "$1" == "incr" ]
14 then
15 echo "Incremental backup of $USER files"
16 INCR_LIST="/tmp/$DATE.$USER.incr_file_list"
17 find /home/$USER -type f -mtime -1 | cut -c2- > $INCR_LIST
18 tar cf $TARGETDIR/$DATE/incr_$USER$LIST.tar -W -T $INCR_LIST
19 rm $INCR_LIST
20 else
21 for LIST in `echo $FILE_LISTS`
22 do
23 echo " Backing up into $TARGETDIR/$DATE the content of $USER / $LIST"
24 tar cf $TARGETDIR/$DATE/$USER$LIST.tar -W -T /home/$USER/.$LIST.filelst
25 done
26 fi
27 done
To automate everything you must use the cron utility. If you save the script of Listing 1 as /usr/local/bin/mybackupscript.sh, these two cron commands:
30 23 * * 0 /usr/local/bin/mybackupscript.sh
30 23 * * 1,2,3,4,5,6 /usr/local/bin/mybackupscript.sh incr
will create full backups at 11:30 pm of every Sunday (day 0 in the fifth column) and incremental ones at the same hour of every other day.
Tar has many more features than those used in this example. Besides the manual, the best resource to see what GNU tar can do is the online gallery of tar commands at CommandLineFu.