Author: Leslie P. Polzer
First, since solving problems is easier when you don’t have to do it yourself, let’s find out whether somebody has already handled this problem. Some Google search runs later it’s evident that the few available tools are all for Microsoft Windows, and like most programs for Windows, they are not free of charge and limit your freedom.
For Linux, there’s GPL-licensed WebMonX, but it’s a GUI tool that requires lots of clicking and notifies you with popups and sounds. If that’s your thing, fine — you have found a ready-made solution that suits your needs. If not, let’s try writing a simple script that meets some KISS criteria:
- Unobtrusive: popping up a message on every change is a no-no. An email message should do the job nicely.
- Small: only a few lines of code.
- Modular: should rely on widely available and well-tested components.
- Smart: when changes are detected, we want a diff of them.
We need a text browser — for example, w3m — to get the pages in rendered form. Just grabbing the raw HTML or HTTP request answer would do, of course, but it’s not nice to look at. Second, we’ll use a hash program like md5sum
or sha1sum
— both of which can be found in the GNU Coreutils package — to generate a name for the file where we store a snapshot of the page. Then we need a working diff
and, finally, an implementation of the mail
command, which should be provided by your local MTA. We can also use some basic utilities that should be installed on every system, such as wc
and touch
.
When everything is in place, we can use the following shell script to do our tracking task. It scans the file list.txt, reading one URL from each line. We get a current version of the URL’s contents and compare it with the saved version, then send changes, if there are any, to the email address specified in the RECIP variable.
#!/bin/sh # webtrack.sh RECIP=user@host # where notifications get sent DUMPCMD="w3m -dump" # text browser invocation for url in $(cat list.txt); do md5=$(echo "$url" | md5sum | cut -d -f 1) touch $md5.txt $DUMPCMD "$url" > tmp.txt if diff $md5.txt tmp.txt >/dev/null; then : #echo no changes else : #echo "changes: " diff -Napu $md5.txt tmp.txt > diff.txt mv tmp.txt $md5.txt mail -s "Changes in $url found." "$RECIP" <<eof The diff has $(wc -l diff.txt | cut -d -f 1) lines. Changes are below. $(cat diff.txt) eof fi done
Now just populate list.txt with one URL per line, make the script executable (chmod 755 webtrack.sh
) and set up a cronjob for it with an entry like this in your crontab file: 0 8 * * * /path/to/webtrack.sh
. This will check the sites in list.txt every morning at 8 a.m. Check the crontab(1) man page if you are not sure what to do with this line.
It’s also nice to have a script that appends a new URL to list.txt. For local lists, we can just use echo
directly to append the URL. For a remote list, we execute echo
remotely via ssh
.
#!/bin/sh # ww-add.sh # if the list is local echo '$1' >> /path/to/list.txt # if the list is remote ssh user@host "echo '$1' >> /path/to/list.txt"
Happy tracking!
We can easily learn from this little exercise that shell scripts can make our life easier and save us hours of time compared to doing things manually over and over.
Leslie P. Polzer is a free software consultant and writer who has plenty of experience in leaving chores to the computer.