File compression tools for Linux

760

Author: Shashank Sharma

Among the confusion new Linux users often face is the variability among archived and compressed formats used by downloaded applications. “Should I use the tar.gz file, the zip file, or the tar.bz2 file?” they may wonder. Here’s what you need to know about compression formats in order to easily install any application.

First, consider the distinction between archiving and compression. Archiving means combining a number of files together into one file. The idea is to achieve easier storage and transportation. It’s like having a briefcase in which to keep all your files. The archive must contain some information about the original files, such as their names and lengths, for proper reconstruction. This ensures that your paperwork will remain as-is when you open your briefcase. Some popular archive file formats are tar and zip.

Compression, on the other hand, is the process of using encoding schemes to store information in fewer bits than traditional representation would use. It’s similar to the difference between shorthand writing and normal writing, where the former requires less paper than the latter. Common compression formats are zip, gz, and bz2.

Working with archives

The tar format is the most common archive format on *nix systems. Tar was originally designed for transferring files to and from tape drives — its name is short for “tape archive.” A tar archive is commonly called a tarball.

To archive files, use a command like tar -cf archive.tar file1 file2 file3. This command combines file1, file2, and file3 and stores them in archive.tar. The -c switch tells tar that you want to create an archive. The -f switch indicates that we are working on files.

The command tar -xf archive.tar extracts all the files from archive.tar and stores them in the current directory with their original names.

Compressing files

In *nix land, bz and gz are two of the most common compression formats. Typically you use the bzip2 utility to create bz files and gzip to create gz. The fundamental difference is in the compression algorithm used by bzip2, which results in considerably smaller files. The downside is that bzip2 eats up more memory.

To compress a file using gzip, use the command gzip filename. The result is a file named filename.gz. Thus the command gzip homepage.htm yields homepage.htm.gz.

One thing to remember about gzip is that it replaces the original file with one which has .gz extension.

To uncompress files, use either gzip -d or gunzip.

bzip2 is similar to gzip. As with gzip, bzip2 also overwrites the original file with one which has a .bz or .bz2 extension. Decompressing .bz files is a breeze — use bzip2 -d or bunzip2.

Both gzip and bzip2 maintain the ownership and permissions of the original file when compressing.

You can also use the zip utility to compress files, if you wish to share files with friends who use a non-*nix platform. zip files.zip file1 file2 file3 would compress the three files, display the rate of compression of each file, and store them in files.zip. The unzip program can be used to extract the contents of a zip file.

Compressed archives

Unlike zip, which offers compression and archiving functionality, tar is capable of archiving only. This means that after you create a tarball, its size is the same as the cumulative size of the individual files. To reduce the size of a tarball, you must compress it by using either gzip or bzip2:


tar -cf archived.tar file1 file2 file3
gzip archived.tar

This compresses archived.tar and replaces it with archived.tar.gz. You could also use bzip2 instead of gzip.

How do you extracting files from a compressed tarball? Use tar zxvf archived.tar.gz to extract all the files from a gzip-compressed tarball. The z switch tells tar that the tarball was compressed using gzip.

If you used bzip2 to compress this tarball, you’d get an error message if you used tar’s z switch. To decompress a bzip2-compressed tarball, you need to use the j switch in its place. tar jxvf archived.tar.bz would extract the files.

You may encounter compressed tarballs in other formats, such as tgz and tbz2. These are short for tar.gz and tar.bz2, respectively.

Though it may seem complex at first, that’s all there is to archiving and compression under *nix. From here on, no matter what compression format you encounter, you’ll know what you need to do.

Shashank Sharma is studying for a degree in computer science. He specializes in writing about free and open source software for new users.