Author: Joe Barr
wget -options protocol://url
Let’s save the options for later and begin by looking at the protocol://url combination. As noted above, Wget groks HTTP, HTTPS and and FTP. Indicate how you want to talk to the remote site by specifying one of: http, https, or ftp. Like this:
wget -options ftp://url
As for the site, let’s try to get a complete copy of the current version of Slackware. It will be difficult because there is limited bandwidth available and the connections are rationed. Filling out the URL, our command looks like this before selecting the options:
wget -options ftp://carroll.cac.psu.edu/pub/linux/distributions/slackware/slackware-current/
Now about those options. We’ll only need two to get the job done: -c
and -r
. You can combine those into a single option so the complete command looks like this:
wget -cr ftp://carroll.cac.psu.edu/pub/linux/distributions/slackware/slackware-current/
The -c
option tells wget to continue a previously executed wget or ftp session. This allows you to recover from network interruptions or outages without starting from byte zero. The -r
option tells wget that this is a recursive request and that it should retrieve everything in and below the target URL.
As it happens, I was able to get a connection to the ftp server, but lost it before the entire contents of the directory had been retrieved. After trying 20 times to reconnect, wget threw up its hands in despair and quit, informing me that 1,000 files and 422 million bytes of data had been transferred. I suspect — due to the round number of files — that the connection may have been terminated due to a daily quota by the server rather than the number of options.
In any case, there is another option, the -t number
option, to specify the number of times to try to reconnect. The default is 20, but you can set it to be any number you like. If you specify -t 0
, wget will try an infinite number of times.
Wget a website
You can also use wget to create a local, browsable version of a Web site. Note that this method does not work on all sites, but works perfectly well on sites which rely on plain HTML to publish content. It doesn’t work well, for example, on sites like Linux.com. But for sites like The Dweebspeak Primer, it’s great.
We’ll replace the ftp protocol in the command line with http, and add a couple of new options in order to create a local, browsable version of the site. The -E
option (case is important) tells wget to add an .html extension to each page it downloads that may have been generated by a CGI or which has an .asp extension so that it is viewable locally. You may also want to add the -k
and -K
options. The -k
option ensures that links are converted for local viewing. The -K
option backs up the original version of a file with a “.orig” suffix, so that different stories that are generated with the same page name are not overwritten.
Here is what I used to duplicate my site:
wget -rEKk http://www.pjprimer.com
Conclusion
As always with CLI Magic, this is an introduction to a command line tool, not a complete tutorial. Get to know the man and use it to learn more about wget and other useful command line jewels.