Firefox, Chrome, and other browsers do an acceptable job of downloading a single file of reasonable size. But I don’t like to trust a browser to grab ISO images and other files that are hundreds of megabytes, or larger. For that I prefer to turn to wget
. You’ll find that using wget
provides some significant advantages over grabbing files with your browser.
First of all, there’s the obvious — if your browser crashes or you need to restart for some reason, you don’t lose the download. Firefox and Chrome have been fairly stable for me lately, but it’s not unheard of for them to crash. That’s a bit of a bummer if they’re 75% of the way (or 98%) through downloading a 3.6GB ISO for the latest Fedora or openSUSE DVD.
It’s also inconvenient when I want to download a file on a server. For example, if I’m setting up WordPress on a remote system I need to be able to get the tarball with the latest release on the server. It seems silly to copy it to my desktop and then use scp
to upload it to the server. That’s twice the time (at least). Instead, I use wget
to grab the tarball while I’m SSH’ed into the server and save myself a few minutes.
Finally, wget
is scriptable. If you want to scrape a Web site or download a file every day at a certain time, you can use wget
as part of a script that you call from a cron job. Hard to do that with Firefox or Chrome.
Get Started with wget
Most Linux distributions should have wget
installed, but if not, just search for the wget package. Several other packages use or reference wget, so you’ll probably get several results — including a few front-ends for wget
.
Let’s start with something simple. You can download files over HTTP, FTP, and HTTPS with wget
, so let’s say you want to get the hot new Linux Mint Fluxbox edition. Just copy the URL to the ISO image and pass it to wget
like so:
wget http://ftp.mirrorsite.net/pub/linuxmint/stable/9/linuxmint-9-fluxbox-cd-i386.iso
Obviously, you’d replace “mirrorsite” with a legitimate site name, and the path to the ISO image with the correct path.
What about multiple files? Here’s where wget
really starts showing its advantages. Create a text file with the URLs to the files, one per line. For instance, if I wanted to copy the CD ISO images for Fedora 14 alpha, I’d copy the URLs for each install ISO to a text file like this:
http://mirrorsite.net/pub/fedora/14/cd1.iso
http://mirrorsite.net/pub/fedora/14/cd2.iso
http://mirrorsite.net/pub/fedora/14/cd3.iso
You get the idea. Save the file as fedoraisos.txt
or similar and then tell wget
to download all of the ISO images:
wget -i fedoraisos.txt
Now wget
will start grabbing the ISOs in order of appearance in the text file. That might take a while, depending on the speed of your Net connection, so what happens if the transfer is interrupted? No sweat. If wget
is running, but the network goes down, it will continue trying to fetch the file and resume where it left off.
But what if the computer crashes or you need to stop wget
for some other reason? The wget
utility has a “continue” option (-c
) that can be used to resume a download that’s been interrupted. Just start the download using the -c
option before the argument with the file name(s) like so:
wget -c ftp://mirrorsite.net/filename.iso
If you try to resume a download after wget
has been stopped, it will usually start from scratch and save to a new file with a .1
after the main filename. This is wget
trying to protect you from “clobbering” a previous file.
Mirroring and More
You can also use wget
to mirror a site. Using the --mirror
option, wget
will actually try to suck down the entire site, and will follow links recursively to grab everything it thinks is necessary for the site.
Unless you own a site and are trying to make a backup, the --mirror
site might be a bit aggressive. If you’re trying to download a page for archival purposes, the -p
option (page) might be better. When wget
is finished, it will create a directory with the site name (so if you tried Linux.com, it’d be linux.com
) and all of the requisite files underneath. Odds are when you open the site in a browser it won’t look quite right, but it’s a good way to get the content of a site.
Password protected sites are not a problem, as wget
supports several options for passing the username and password to a site. Just use the --user
and --password
options, like so: wget --user=username --password=password ftp://mirrornet.net/filename.file
where the user name and password are replaced with your credentials. You might want to specify this from a script if you’re on a shared system, lest other users see the username and password via top
, ps
or similar.
Sometimes a site will deny access to non-browser user agents. If this is a problem, wget
can fake the user agent string with --user-agent=agent-string
.
If you don’t have the fastest connection in the world, you might want to throttle wget
a bit so it doesn’t consume your available bandwidth or hammer a remote site if you are on a fast connection. To do that, you can use the --limit-rate
option, like this:
wget --limit-rate=2m http://filesite.net/filename.iso
That will tell wget
to cap its downloads at 2 megabytes, though you can also use k
to specify kilobytes.
If you’re grabbing a bunch of files, the -w
(wait) option can pause wget
between the files. So wget -w=1m
will pause wget
one minute between downloads.
There’s a lot more to wget
, so be sure to check the man page to see all the options. In a future tutorial, we’ll cover using wget
for more complex tasks and examining HTTP responses from Apache.