Author: Preston St. Pierre
On the surface, my Web server seemed to be running fine, serving up Web pages normally. While performing some routine maintenance, I happened to run an ls
command in the root directory, and it returned and empty listing — no directories or files. Nothing. The ls
command showed files in
the /boot directory, so it appeared there was some file system damage.
Checking the system message log showed hundreds of these alarming
entries:
Dec 21 01:05:01 linux01 kernel: EXT3-fs error (device cciss0(104,1)): ext3_new_block: Allocating block in system zone - block = 96
Errors in the ext3 file system are never a good sign. I decided to boot into rescue mode and run a file system check (fsck
) to clear up any problems with the file system. The fsck
turned up a lot of bad directory entries and offered to repair them. Since there were so many, I ran the command again with the automatic repair switch. After some time and lot of scary messages, the repair finished. I held my breath and rebooted. The boot loader failed to find the kernel. Another reboot into rescue mode showed that the file
system was now clean — a little too clean. There were no files left. Only the /lost+found directory survived the repair, and it was empty.
Want to get away?
In eight years of running Linux, I had never needed to completely restore a production system. Most restores were for individual files or directories. In moments like this, it is evident how important a well-tested backup system is.
My system runs a shell script via cron to do a full backup every night. I prefer doing full backups rather than incremental or differential if possible. If the amount of data you have to backup is too large, you may have to use other backup strategies. In my case, the entire system and all its content take up about 10GB, easily fitting on a single DLT tape.
The backup script uses tar. It doesn’t compress the backups using software, because they’re compressed by the tape drive. While there are plenty of fancy backup systems available, both open source and commercial, I rely on the simple, tried-and-true GNU version of tar. It gives me a backup system that is easy to understand and use. The administrators of non-Linux servers at my site use an expensive, name-brand commercial backup system, but their backups are not nearly as reliable. On the other hand, tar does not offer some of the advanced features that some environments may need.
Here is the backup script I used:
#!/bin/sh # Backup # # This script backs up all server data to tape. # Delete old mysql dumps rm -f -r /backup/mysql mkdir /backup/mysql # Dump all mysql databases mysqldump --add-drop-table -A --user=root --password=xxxx > /backup/mysql/databases.sql # Initialize the tape drive if /bin/mt -f "/dev/nst0" tell > /dev/null 2>&1 then # Some drives require zeroing the data before # they can be overwritten. /bin/mt -f "/dev/nst0" rewind > /dev/null 2>&1 /bin/dd if=/dev/zero of="/dev/nst0" bs=32k count=1 > /dev/null 2>&1 /bin/mt -f "/dev/nst0" rewind > /dev/null 2>&1 else echo "Backup aborted: No tape loaded" exit 1 fi # Do backup /bin/tar --create --verbose --preserve --ignore-failed-read --file=/dev/nst0 / > /backup/filelist.txt # Add completion date to filelist echo "Backup complete on " `date` >> /backup/filelist.txt /bin/mt -f "/dev/nst0" rewind /bin/mt -f "/dev/nst0" eject
The mt
or magnetic tape command is used to control SCSI tape drives. The script includes a dump of all MySQL databases to text files. Although MyISAM format databases in MySQL can be restored directly from tape, it is safer to restore the databases from a dump.
The script also saves the list of files backed up to a log file (/backup/filelist.txt). When tar runs, it sends the file names it is copying to standard out and the script redirects them to the log file. You need the file list to do an efficient bare metal restore from tape (more on that later). Notice the --preserve
option in the tar command. It ensures that file permissions are saved along with the files.
Just like starting over
Since the system would not boot, my first step was to get at least a minimal system running so I could run a restore. I dug out my RHEL CDs and started the install process. There is a handy option during the install, at the bottom of the package selection screen, called “Minimal Install.” I chose that option to save time and because I planned to overwrite the system later when I restored it from tape.
One tricky point about a full restore is that the directories need to be restored in the order they exist on the tape. The tape drive is a serial device that can’t read backward. So, if the first file you restore is halfway through the tape, you can’t go back and restore something from the first part of the tape without rewinding it and running another restore. That’s why the filelist.txt log file is so important. Of course, the filelist.txt file was destroyed with the rest of the system, so that was the first file I had to restore.
The backup tape from the previous night was still on site (our off-site rotations happen once a week). Once I restored the filelist.txt file, I browsed through the list to determine the order that the directories were written to the tape. Then, I placed that list in this restore script:
#!/bin/sh # Restore everything # This script restores all system files from tape. # # Initialize the tape drive if /bin/mt -f "/dev/nst0" tell > /dev/null 2>&1 then # Rewind before restore /bin/mt -f "/dev/nst0" rewind > /dev/null 2>&1 else echo "Restore aborted: No tape loaded" exit 1 fi # Do restore # The directory order must match the order on the tape. # /bin/tar --extract --verbose --preserve --file=/dev/nst0 var etc root usr lib boot bin home sbin backup # note: in many cases, these directories don't need to be restored: # initrd opt misc tmp mnt # Rewind tape when done /bin/mt -f "/dev/nst0" rewind
In the script, the list of directories to restore is passed as parameters to tar. Just as in the backup script, it is important to use the
--preserve
switch so that file permissions are restored to the way they were before the backup. I could have just restored the / directory, but
there were a couple of directories I wanted to exclude, so I decided to be explicit about what to restore. If you want to use this script for your own restores, be sure the list of directories matches the order they were backed up on your system.
Although it is listed in the restore script, I removed the /boot directory from my restore, because I suspected my file system problem was related to a kernel upgrade I had done three days earlier. By not restoring the /boot directory, the system would continue to use the stock kernel that shipped on the CDs until I upgraded it. I also wanted to exclude the /tmp directory and a few other directories that I knew were not important.
The restore ran for a long time, but uneventfully. Finally, I rebooted the system, reloaded the MySQL databases from the dumps, and the system was fully restored and working perfectly. Just over four hours elapsed from total meltdown to complete restore. I probably could trim at least an hour off that time if I had to do it a second time.
Postmortem
I filed a bug report with Red Hat Bugzilla, but I could only provide log files from the day before the crash. All core files and logs from the day of the crash were lost when I tried to repair the file system. I exchanged posts with a Red Hat engineer, but we were not able to nail down the cause. I suspect the problem was either in the RAID driver code or ext3 code. I should note that the server is a relatively new HP ProLiant server with an Intel hyperthreaded Pentium 4 processor. Because the Linux kernel sees a hyperthreaded processor as a dual processor, I was using an SMP kernel when the problem arose. I reasoned that I might squeeze a few percentage points of performance out of the SMP kernel. This bug may only manifest when running on a hyperthreaded processor in SMP mode. I don’t have a spare server to try to recreate it.
After the restore, I went back to the uniprocessor kernel and have not yet patched it back up to the level it had been. Happily, the ext3 error has not returned. I scan the logs every day, but it has been well over a month since the restore and there are still no signs of trouble. I am looking forward to my next full restore — hopefully not until sometime in 2013.