Author: Shashank Sharma
First, consider the distinction between archiving and compression. Archiving means combining a number of files together into one file. The idea is to achieve easier storage and transportation. It’s like having a briefcase in which to keep all your files. The archive must contain some information about the original files, such as their names and lengths, for proper reconstruction. This ensures that your paperwork will remain as-is when you open your briefcase. Some popular archive file formats are tar and zip.
Compression, on the other hand, is the process of using encoding schemes to store information in fewer bits than traditional representation would use. It’s similar to the difference between shorthand writing and normal writing, where the former requires less paper than the latter. Common compression formats are zip, gz, and bz2.
Working with archives
The tar format is the most common archive format on *nix systems. Tar was originally designed for transferring files to and from tape drives — its name is short for “tape archive.” A tar archive is commonly called a tarball.
To archive files, use a command like tar -cf archive.tar file1 file2 file3
. This command combines file1, file2, and file3 and stores them in archive.tar. The -c
switch tells tar that you want to create an archive. The -f
switch indicates that we are working on files.
The command tar -xf archive.tar
extracts all the files from archive.tar and stores them in the current directory with their original names.
Compressing files
In *nix land, bz and gz are two of the most common compression formats. Typically you use the bzip2 utility to create bz files and gzip to create gz. The fundamental difference is in the compression algorithm used by bzip2, which results in considerably smaller files. The downside is that bzip2 eats up more memory.
To compress a file using gzip, use the command gzip filename
. The result is a file named filename.gz. Thus the command gzip homepage.htm
yields homepage.htm.gz.
One thing to remember about gzip is that it replaces the original file with one which has .gz extension.
To uncompress files, use either gzip -d
or gunzip
.
bzip2 is similar to gzip. As with gzip, bzip2 also overwrites the original file with one which has a .bz or .bz2 extension. Decompressing .bz files is a breeze — use bzip2 -d
or bunzip2
.
Both gzip and bzip2 maintain the ownership and permissions of the original file when compressing.
You can also use the zip utility to compress files, if you wish to share files with friends who use a non-*nix platform. zip files.zip file1 file2 file3
would compress the three files, display the rate of compression of each file, and store them in files.zip. The unzip
program can be used to extract the contents of a zip file.
Compressed archives
Unlike zip, which offers compression and archiving functionality, tar is capable of archiving only. This means that after you create a tarball, its size is the same as the cumulative size of the individual files. To reduce the size of a tarball, you must compress it by using either gzip or bzip2:
tar -cf archived.tar file1 file2 file3
gzip archived.tar
This compresses archived.tar and replaces it with archived.tar.gz. You could also use bzip2 instead of gzip.
How do you extracting files from a compressed tarball? Use tar zxvf archived.tar.gz
to extract all the files from a gzip-compressed tarball. The z
switch tells tar that the tarball was compressed using gzip.
If you used bzip2 to compress this tarball, you’d get an error message if you used tar’s z
switch. To decompress a bzip2-compressed tarball, you need to use the j
switch in its place. tar jxvf archived.tar.bz
would extract the files.
You may encounter compressed tarballs in other formats, such as tgz and tbz2. These are short for tar.gz and tar.bz2, respectively.
Though it may seem complex at first, that’s all there is to archiving and compression under *nix. From here on, no matter what compression format you encounter, you’ll know what you need to do.
Shashank Sharma is studying for a degree in computer science. He specializes in writing about free and open source software for new users.