Recovering Linux files and filesystems

3149

Author: Roderick W. Smith

Last time we talked about resizing and defragmenting Linux filesystems. This time, in the final article in this series, we’ll look at how to recover lost files.

This article is excerpted from the recently published book Linux Power Tools.

Recovering from filesystem corruption

A common nightmare among computer users is losing data due to filesystem corruption. This can happen because of errors introduced during a system crash, filesystem driver bugs (particularly when using non-Linux drivers to access Linux partitions), human error involving low-level disk utilities, or other factors. All Linux filesystems include disk-check tools, but they differ in many details.

Most filesystems provide utilities that scan the filesystem’s contents for internal consistency. This tool can detect, and often correct, errors such as mangled directories, bad time stamps, inodes that point to the wrong part of the disk, and so on. In Linux, the fsck utility serves as a front-end to filesystem-specific checking tools, which usually have names of the form fsck.filesystem, where filesystem is the filesystem name, such as jfs or ext2. If you need to check a filesystem manually, you can either call fsck, which then calls the filesystem-specific utility; or you can call the filesystem-specific program directly.

Precisely what each fsck program does is highly filesystem-specific. These programs often perform multiple passes through the filesystem. Each pass checks for a particular type of problem. On a badly corrupted partition, you may need to run fsck several times to fix all the problems. You may also need to answer questions posed by fsck for some types of errors. Unfortunately, these questions tend to be cryptic to any but experienced filesystem gurus. Fortunately, they usually accept yes/no responses, so you can guess (the default value generally produces acceptable results).

If your partition is badly corrupted, and if you have enough unused space on another partition or hard disk, try copying the entire partition’s data with dd as a backup prior to running fsck. For instance, typing dd if=/dev/sda5 of=/dev/hda7 backs up the /dev/sda5 partition to /dev/hda7. You can then restore that image and try other options if fsck doesn’t work properly. You can also try adding the noerror option to a dd command to copy a partition on a drive that’s producing physical read errors. It’s possible that fsck will then be able to recover data from the copied partition even if it fails on the original because of the read errors.

During system startup, Linux examines the filesystem entry in /etc/fstab to determine whether to run fsck automatically. Specifically, the final field for a filesystem entry is a code specifying when to check the filesystem. A value of 0 means not to run fsck on startup. This setting is appropriate for non-Linux filesystems. The root (/) filesystem should normally have a value of 1, meaning that it’s checked first. All other Linux native filesystems should normally have values of 2, meaning that they’re checked after the root filesystem. Filesystem checks at boot time can cause problems in some distributions and with some journaling filesystems, though. Specifically, the filesystem check tool may slow down the bootup process because it unnecessarily forces a thorough disk check. For this reason, these checks are sometimes disabled (with a 0 value in the final /etc/fstab field), particularly for ReiserFS, and sometimes for ext3fs and XFS. These filesystems are all capable of automatically replaying their journals when mounted, so an explicit filesystem check at boot time normally isn’t necessary.

Filesystem check options

The main fsck program supports several options, which it passes on to the filesystem-specific programs as necessary. Most of the interesting options are filesystem-specific. Some of the things you can specify for particular filesystems include:

Backup superblocks

The ext2 and ext3 filesystems use superblocks, which contain many vital filesystem details. Because the superblocks are so important, these filesystems store backup superblocks on the disk. If the main superblock is damaged, fsck.ext2 can use a backup whose location you specify with -b. You can learn the location of the backup superblocks by typing mke2fs -n /dev/file, where /dev/file is the device file for the partition. Be sure to include the -n option or mke2fs will create a new filesystem, destroying the old one!

Detecting bad blocks

You can have fsck.ext2 scan for physically bad sectors on the disk and mark them as such by passing it the -c option. Doubling up this option (that is, using -cc) performs a non-destructive read/write test, which may catch problems that a read-only test will miss.

Forcing a check

Normally, most fsck utilities perform only minimal checks if the filesystem is marked as clean. You can force a check with the fsck.ext2 and fsck.jfs utilities by passing them the -f option. In reiserfsck, the --check option causes it to check for errors but not correct them, while --fix-fixable causes it to automatically fix certain types of errors.

Journal location

To tell fsck.ext2, fsck.jfs, or reiserfsck where an external journal is located, pass the location with the -j journal-location parameter.

Minimizing user interaction

The -p option to fsck.ext2 and fsck.jfs minimizes the questions they ask. The -n and -y options to fsck.ext2 cause it to assume answers of no or yes, respectively, to all questions. (A no response causes fsck to not make changes to the filesystem.) The -s option to xfs_check causes it to report only serious errors. This option can make it easier to spot a problem that prevents mounting the filesystem when the filesystem also has less important problems.

Rebuilding the superblock

The reiserfsck program supports rebuilding the superblock by using the --rebuild-sb option.

Rebuilding the filesystem tree

You can rebuild the entire tree of directories and files by using the --rebuild-tree option to reiserfsck. This option is potentially very risky, but it can fix major problems. Back up the partition using dd before using this option.

Omit replaying the journal

The -o option to fsck.jfs causes the program to omit replaying the journal. This option is most useful when the journal itself has been corrupted.

You don’t normally need to use fsck except when it runs into problems after a reboot. In that case, Linux may drop you into a maintenance shell and advise you to run fsck manually. Of course, you can run this utility at your discretion, as well, and doing so may be advisable if your disk access is acting up — if files are disappearing or if the disk appears to have too much or too little free space, for instance.

Never run fsck on a filesystem that’s mounted for read/write access. Doing so can confuse Linux and lead to more filesystem corruption. If necessary, shut down the system and boot from an emergency system to run fsck on an unmounted filesystem or to run on one that’s mounted for read-only access.

Journaling filesystems: Not a panacea

Many people seem to think of journaling filesystems as protection against filesystem errors. This isn’t their primary purpose, though. Journaling filesystems are designed to minimize filesystem check times after a crash or other severe error. Ext2fs has a good reputation as a reliable filesystem. Linux’s journaling filesystems also have good reputations, but you shouldn’t assume they’re any safer than ext2fs.

Because journaling filesystems minimize system startup time after power outages, some people have taken to shutting off Linux systems that use journaling filesystems without shutting Linux down properly. This practice is risky at best. Linux still caches accesses to journaling filesystems, so data from recently written files may be lost if you power off the computer without shutting it down first. Always shut down the computer with the shutdown utility (or a program that calls it, such as a GUI login screen) before turning off the power.

Recovering deleted files

Perhaps the most common type of filesystem problem is files that are accidentally deleted. Users frequently delete the wrong files or delete a file only to discover that it’s actually needed. Windows system users may be accustomed to undelete utilities, which scour the disk for recently deleted files in order to recover them. Unfortunately, such tools are rare on Linux. You can make undeletion easier by encouraging the use of special utilities that don’t really delete files, but instead place them in temporary holding areas for deletion later. If all else fails, you may need to recover files from a backup.

Trash can utilities

One of the simplest ways to recover “deleted” files is to not delete them at all. This is the idea behind a trash can — a tool or procedure to hold onto files that are to be deleted without actually deleting them. These files can be deleted automatically or manually, depending on the tool or procedure. The most familiar form of trash can utility for most users, and the one from which the name derives, is the trash can icon that exists in many popular GUI environments, including KDE and GNOME. To use a GUI trash can, you drag files you want to delete to its icon. The icon is basically just a pointer to a specific directory that’s out of the way or hidden from view, such as ~/Desktop/Trash or ~/.gnome-desktop/Trash. When you drag a file to the trash can, you’re really just moving it to that directory. If you subsequently decide you want to undelete the file, you can click or double-click the trash can icon to open a file browser on the trash directory. This enables you to drag the files you want to rescue out of the trash directory. Typically, files are only deleted from the trash directory when you say so by right-clicking the trash can icon and selecting an option called Empty Trash or something similar.

When you’re working from the command line, the rm command is the usual method of deleting files, as in rm somefile.txt. This command doesn’t use anything akin to the trash directory by default, and depending on your distribution and its default settings, rm may not even prompt you to be sure you’re deleting the files you want to delete. You can improve rm‘s safety considerably by forcing it to confirm each deletion by using the -i option, as in rm -i somefile.txt. In fact, you may want to make this the default by creating an alias in your shell startup scripts. For instance, the following line in ~/.bashrc or /etc/profile will set up such an alias for bash:

alias rm='rm -i'

This configuration can become tedious if you use the -r option to delete an entire directory tree, though, or if you simply want to delete a lot of files by using wildcards. You can override the alias by specifying the complete path to rm (/bin/rm) when you type the command.

Forcing confirmation before deleting files can be a useful preventive measure, but it’s not really a way of recovering deleted files. One simple way to allow such recovery is to mimic the GUI environments’ trash cans — instead of deleting files with rm, move them to a holding directory with mv. You can then empty the holding directory whenever it’s convenient. In fact, if you use both a command shell and a GUI environment that implements a trash can, you can use the same directory for both.

If you or your users are already familiar with rm, you may find it difficult to switch to using mv. It’s also easy to forget how many files have been moved into the trash directory, and so disk space may fill up. One solution is to write a simple script that takes the place of rm, but that moves files to the trash directory. This script can simultaneously delete files older than a specified date or delete files if the trash directory contains more than a certain number of files. Alternatively, you could create a cron job to periodically delete files in the trash directory. An example of such a script is saferm. To use saferm or any similar script, you install it in place of the regular rm command, create an alias to call the script instead of rm, or call it by its true name. For instance, the following alias will work:

alias rm='saferm'

In the case of saferm, the script prompts before deleting files, but you can eliminate the prompt by changing the line that reads read answer to read answer=A and commenting out the immediately preceding echo lines. The script uses a trash directory in the user’s home directory, ~/.trash. When users need to recover “deleted” files, they can simply move them out of ~/.trash. This specific script doesn’t attempt to empty the trash bin, so users must do this themselves using the real rm; or you or your users can create cron jobs to do the task.

File recovery tools

Undelete utilities for Linux are few and far between. The Linux philosophy is that users shouldn’t delete files they really don’t want to delete, and if they do, they should be restored from backups. Nonetheless, in a pinch there are some tricks you can use to try to recover accidentally deleted files.

One of these tricks is the recover utility, available with most Linux distributions. Unfortunately, this tool has several drawbacks. The first is that it was designed for ext2fs, and so it doesn’t work with most journaling filesystems. (It may work with ext3fs, though.) Another problem is that recover takes a long time to do anything, even on small partitions. I frequently see network programs such as web browsers and mail clients crash when recover runs. Finally, in my experience, recover frequently fails to work at all; if you type recover /dev/sda4, for instance, to recover files from /dev/sda4, the program may churn for a while, consume a lot of CPU time, and return with a Terminated notice. In sum, recover isn’t a reliable tool, but you might try it if you’re desperate. If you do try to run it, I recommend shutting down unnecessary network-enabled programs first.

Another method of file recovery is to use grep to search for text contained in the file. This approach is unlikely to work on anything but text files, and even then it may return a partial file or a file surrounded by text or binary junk. To use this approach, you type a command such as the following:

# grep -a -B5 -A100 "Dear Senator Jones" /dev/sda4 > recover.txt

This command searches for the text Dear Senator Jones on /dev/sda4 and returns the five lines before (-B5) and the 100 lines after (-A100) that string. The redirection operator stores the results in the file recover.txt. Because this operation involves a scan of the entire raw disk device, it’s likely to take a while. (You can speed matters up slightly by omitting the redirection operator and instead cutting and pasting the returned lines from an xterm into a text editor; this enables you to hit Ctrl+C to cancel the operation once it’s located the file. Another option is to use script to start a new shell that copies its output to a file, so you don’t need to copy text into an editor.) This approach also works with any filesystem. If the file is fragmented, though, it will only return part of the file. If you misjudge the size of the file in lines, you’ll either get just part of the file or too much — possibly including binary data before, after, or even within the target file.

Restoring files from a backup

Emergency recovery procedures — restoring most or all of a working system from a backup — are useful after a disk failure, security breach, or a seriously damaging administrative blunder. System backups can also be very useful in restoring deleted files. In this scenario, an accidentally deleted file can be restored from a backup. One drawback to this procedure is that the original file must have existed prior to the last regular system backup. If your backups are infrequent, the file might not exist. Even if you make daily backups, this procedure is unlikely to help if a user creates a file, quickly deletes it, and then wants it back immediately. A trash can utility is the best protection against that sort of damage.

As an example, suppose you create backups to tape using tar. You can recover files from this backup by using the --extract (-x) command. Typically, you also pass the --verbose (-v) option so that you know when the target file has been restored, and you use --file (-f) to point to the tape device file. You must also pass the name of the file to be restored:

# tar -xvf /dev/st0 home/al/election.txt

This command recovers the file home/al/election.txt from the /dev/st0 tape device. A few points about this command require attention:

Permissions

The user who runs the command must have read/write access to the tape device. This user must also have write permission to the restore directory (normally, the current directory). Therefore, root normally runs this command, although other users may have sufficient privileges on some systems. Ownership and permissions on the restored file may change if a user other than root runs the command.

Filename specification

The preceding command omitted the leading slash (/) in the target filename specification (home/al/election.txt). This is because tar normally strips this slash when it writes files, so when you specify files for restoration, the slash must also be missing. A few utilities and methods of creating a backup add a leading ./ to the filename. If your backups include this feature, you must include it in the filename specification to restore the file.

Restore directory

Normally, tar restores files to the current working directory. Thus, if you type the preceding command while in /root, it will create a /root/home/al/election.txt file (assuming it’s on the tape). I recommend restoring to an empty subdirectory and then moving the restored file to its intended target area. This practice minimizes the risk that you might mistype the target file specification and overwrite a newer file with an older one, or even overwrite the entire Linux installation with the backup.

Unfortunately, tar requires that you have a complete filename, including its path, ready in order to recover a file. If you don’t know the exact filename, you can try taking a directory of the tape by typing tar tvf /dev/st0 (substituting another tape device filename, if necessary). You may want to pipe the result through less or grep to help you search for the correct filename, or redirect it to a file you can search.

You can keep a record of files on a tape at backup time to simplify searches at restore time. Using the --verbose option and redirecting the results to a file will do the trick. Some incremental backup methods automatically store information on a backup’s contents, too. Some backup tools, such as the commercial Backup/Recover Utility, store an index of files on the tape. This index enables you to quickly scan the tape and select files for recovery from the index.

Summary

Linux, like any OS, is built on its filesystems. The ext2 filesystem has long been the standard for Linux, but over the course of development of the 2.4.x kernels, new journaling filesystems have been added as standard equipment. These filesystems give you several options that vary in subtle ways — disk space consumption by different types of files, support for ACLs, and so on. Most systems will work well with any Linux filesystem, but if disk performance is critically important to you, you may want to research the options further to pick the best one for your need. You can also optimize filesystems in various ways, ranging from options at filesystem creation time to defragmenting and resizing filesystems. Unfortunately, filesystems don’t always work perfectly reliably. Sometimes you may need to fix filesystem corruption, and various tools exist to help you do this. Users may also accidentally delete files, and recovering them can be a challenging task, although being prepared by using trash can utilities and performing regular backups can greatly simplify recovery operations.

Category:

  • Linux