Author: Karl Vogel
Locate and xargs
I was looking for a certain CSS stylesheet, and since I update my locate database every night, all I needed was a one-liner. This looks through my notebook files for any HTML documents, and checks each one for a stylesheet entry:
locate $HOME/notebook | grep '.htm$' | xargs grep rel=.stylesheet
I was also looking for an example of how to create a solid border:
locate $HOME/notebook | grep '.css$' | xargs grep solid
/home/vogelke/notebook/2005/0704/2col.css: border-left: 1px solid gray; /home/vogelke/notebook/2005/0704/3col.css: border: 1px solid gray;
The locate databases on our file servers are also updated every night, so it’s easy to tell if someone’s deleted something since yesterday. This comes in handy when there’s an irate customer on the phone; if I can find their files using locate, it means the files were on the system as of late yesterday, so the customer can find out if someone in their workgroup did something clever this morning that trashed their files.
Either someone fesses up to deleting the files, or they’ve been moved; usually, the mover thought they were putting the files in one folder, and they ended up either hitting the parent folder by mistake or creating an entirely new folder somewhere else. A quick find will often fix this without my having to restore anything.
If I can’t find the files using locate, they were zapped at least a day or two ago, which generally means a trip to the backup server.
Tools to keep a sitelog
I learned the hard way (several times) that messing with a server and neglecting to write down what you did can easily screw up an entire weekend.
My first few attempts at writing a site-logging program weren’t terribly successful. I’ve been working with the Air Force for nearly 25 years, and when someone from the federal government tells you that you tend to over-design things, your process clearly needs a touch up. A basic text file with time-stamped entries solves 90% of the problem with about 10% of the effort.
The sitelog file format is pretty simple — think of a Weblog with all the entries jammed together in ascending time order. Timestamp lines are left-justified; everything else has at least four leading spaces or a leading tab. Code listings and program output are delimited by dashed lines ending with a single capital ‘S’ or ‘E’, for start and end. The whole idea was to be able to write a Perl parser for this in under an hour.
Here’s an example, created when I installed Berkeley DB. I’ve always used LOG for the filename, mainly because README was already taken.
BEGINNING OF LOG FOR db-4.4.20 ============================
Fri, 23 Jun 2006 19:34:15 -0400 Karl Vogel (vogelke at myhost)
To build:
https://localhost/mis/berkeley-db/ref/build_unix/intro.html
--------------------------------------------------------------S
me% cd build_unix
me% CC=gcc CFLAGS="-O" ../dist/configure --prefix=/usr/local
installing in /usr/local
checking build system type... sparc-sun-solaris2.8
checking host system type... sparc-sun-solaris2.8
[...]
config.status: creating db.h
config.status: creating db_config.h
me% make
/bin/sh ./libtool --mode=compile gcc -c -I. -I../dist/..
-D_REENTRANT -O2 ../dist/../mutex/mut_pthread.c
[...]
creating db_verify
/bin/sh ./libtool --mode=execute true db_verify
--------------------------------------------------------------E
Fri, 23 Jun 2006 20:32:34 -0400 Karl Vogel (vogelke at myhost)
Install:
--------------------------------------------------------------S
root# make install_setup install_include install_lib install_utilities
Installing DB include files: /usr/local/include ...
Installing DB library: /usr/local/lib ...
[...]
cp -p .libs/db_verify /usr/local/bin/db_verify
--------------------------------------------------------------E
These scripts do most of the heavy lifting:
-
The timestamp script writes a line holding the current time in ARPA-standard format, your full name, your userid, and the name of the host you’re on. A short version is included below; I have a longer one that can parse most date formats and return a line with that time instead of the current time.
-
The remark script starts Vim on the LOG file and puts me at the last line so I can append entries. The ‘v’ key (often unused) is mapped to call timestamp and append its output after the current line.
-
Mfmt originally stood for “make format”; intended to take the output of make (or any program), break up and indent long lines to make them more readable, and wrap the whole thing in dashed lines ending with ‘S’ and ‘E’.
-
The site2html script reads a LOG file and generates a decent-looking Web page, like this one.
-
The log2troff script will read a LOG file and generate something that looks good on paper.
Here’s a short version of timestamp:
#!/bin/sh
PATH=/usr/local/bin:/bin:/usr/bin; export PATH
name=`grep "^$USER:" /etc/passwd | cut -f5 -d:`
host=`hostname | cut -f1 -d.`
exec date "+%n%n%a, %d %b %Y %T %z $name ($USER at $host)%n"
exit 0
If I know I’ve logged something, it’s also nice to be able to do something like locate LOG | xargs grep something
to find it.
The w3m text Web browser
W3m is a text-based Web browser which does a wonderful job of rendering HTML tables correctly. If I want a halfway-decent text-only copy of a Web page that includes tables, I run a script that calls wget to fetch the HTML page and then w3m to render it:
1 #!/bin/ksh
2 # Fetch files via wget, w3m. Usage: www URL
3
4 PATH=/usr/local/bin:$PATH
5 export PATH
6
7 die () {
8 echo "$*" >& 2
9 exit 1
10 }
11
12 #
13 # Don't go through a proxy server for local hosts.
14 #
15
16 case "$1" in
17 "") die "usage: $0 url" ;;
18 *local*) opt="--proxy=off $1" ;;
19 http*) opt="$1" ;;
20 ftp*) opt="$1" ;;
21 esac
22
23 #
24 # Fetch the URL back to a temporary file using wget, then render
25 # it using w3m: better support for tables.
26 #
27
28 tfile="wget.$RANDOM.$$"
29 wget -F -O $tfile $opt
30 test -f $tfile || die "wget failed"
31
32 #
33 # Set the output width from the environment.
34 #
35
36 case "$WCOLS" in
37 "") cols=70 ;;
38 *) cols="$WCOLS" ;;
39 esac
40
41 w3m="/usr/local/bin/w3m -no-graph -dump -T text/html -cols $cols"
42 result="w3m.$RANDOM.$$"
43 $w3m $tfile > $result
44
45 test -f "$result" && $EDITOR $result
46 rm -f $tfile
47 exit 0
Line 18 lets me specify URLs on the local subnet which should not go through our proxy server; traffic through that server is assumed to be coming from the outside world, which requires a username and password.
Lines 28 and 42 create safe temporary files by taking advantage of the Korn shell’s ability to generate random numbers.
I call wget on line 29, using the -F option to force any input to be treated as HTML. The -O option lets me pick the output filename. You might be able to use w3m to do everything, but here it seems to have some problems with the outgoing proxies (which I don’t control), and wget doesn’t.
Lines 36-39 let me specify the output width as an environment variable:
WCOLS=132 www http://some.host/url
would give me wider output for landscape printing. When w3m returns, you’re placed in an editor in case you want to make any final touch ups. After you exit the editor, you should have a new file in the current directory named something like w3m.19263.26012.
Dealing with different archive formats
I got fed up with remembering how to deal with archives that might be tar files, zip files, compressed, frozen, gzipped, bzipped, or whatever bizarre format comes along next. Three short scripts take care of that for me:
-
tc: shows the contents of an archive file
-
tcv: shows the verbose contents of an archive file
-
tx: extracts the contents of an archive file in the current directory
Tc and tcv are hard-linked together:
1 #!/bin/sh
2 # tc: check a gzipped archive file
3 # if invoked as "tcv", print verbose listing.
4
5 case "$#" in
6 0) exit 1 ;;
7 *) file="$1" ;;
8 esac
9
10 name=`basename $0`
11 case "$name" in
12 tcv) opt='tvf' ;;
13 *) opt='tf' ;;
14 esac
15
16 case "$file" in
17 *.zip) exec unzip -lv "$file" ;;
18 *.tgz) exec gunzip -c "$file" | tar $opt - ;;
19 *.bz2) exec bunzip2 -c "$file" | tar $opt - ;;
20 *.tar.gz) exec gunzip -c "$file" | tar $opt - ;;
21 *.tar.Z) exec uncompress -c "$file" | tar $opt - ;;
22 *) exec tar $opt $file ;;
23 esac
Tx is very similar:
1 #!/bin/sh
2 # tx: extract a gzipped archive file
3
4 case "$#" in
5 0) exit 1 ;;
6 *) file="$1"; pat="$2" ;;
7 esac
8
9 case "$file" in
10 *.zip) exec unzip -a "$file" ;;
11 *.tgz) exec gunzip -c "$file" | tar xvf - $pat ;;
12 *.bz2) exec bunzip2 -c "$file" | tar xvf - $pat ;;
13 *.tar.gz) exec gunzip -c "$file" | tar xvf - $pat ;;
14 *) exec tar xvf $file $pat ;;
15 esac
Zsh aliases
I’ve tried GNU Bash and tcsh, but Zsh is definitely my favorite. Here are some of my aliases:
To view command-line history:
h fc -l 1 | less
history fc -l 1
To check the tail end of the syslog file:
syslog less +G /var/log/syslog
To beep my terminal when a job’s done (i.e., /run/long/job && yell):
yell echo done | write $LOGNAME
To quickly find all the directories or executables in the current directory:
d /bin/ls -ld *(-/)
x ls -laF | fgrep "*"
For listing dot-files:
dot ls -ldF .[a-zA-Z0-9]*
Largest files shown first or last:
lsl ls -ablprtFT | sort -n +4
lslm ls -ablprtFT | sort -n +4 -r | less
Smallest files shown first or last:
lss ls -ablprtFT | sort -n +4 -r
lssm ls -ablprtFT | sort -n +4 | less
Files sorted by name:
lsn ls -ablptFT | sort +9
lsnm ls -ablptFT | sort +9 | less
Newly-modified files shown first or last:
lst ls -ablprtFT
lstm ls -ablptFT | less
Converting decimal to hex and back:
d2h perl -e ''printf qq|%Xn|, int( shift )''
h2d perl -e ''printf qq|%dn|, hex( shift )''
Most of these aliases (except for the fc stuff) work just fine in bash, with just a few minor tweaks in the formatting. Some examples:
alias 1='%1'
alias 2='%2'
alias 3='%3'
alias 4='%4'
alias 5='%5'
alias 6='%6'
alias 7='%7'
alias 8='%8'
alias 9='%9'
alias d2h='perl -e "printf qq|%Xn|, int(shift)"'
alias d='(ls -laF | fgrep "/")'
alias dot='ls -ldF .[a-zA-Z0-9]*'
alias h2d='perl -e "printf qq|%dn|, hex(shift)"'
alias h='history | less'
alias j='jobs -l'
alias p='less'
alias x='ls -laF | fgrep "*"'
alias z='suspend'
If you want to pass arguments to an alias, it might be easier to use a function. For example, I use mk
to make a new directory with mode 755, regardless of my umask setting. The $* will be replaced by whatever arguments you pass:
mk () {
mkdir $*
chmod 755 $*
}
You can use seq
to generate sequences, like 10 to 20:
seq () {
local lower upper output;
lower=$1 upper=$2;
while [ $lower -le $upper ];
do
output="$output $lower";
lower=$[ $lower + 1 ];
done;
echo $output
}
For example, use seq 10 20
to generate this:
10 11 12 13 14 15 16 17 18 19 20
Functions can call other functions. For example, if you want to repeat a given command some number of times, try the repeat function:
repeat () { local count="$1" i; shift; for i in $(seq 1 "$count"); do eval "$@"; done }
Running repeat 10 'date; sleep 1'
will produce this:
Wed Jul 5 21:29:18 EDT 2006 Wed Jul 5 21:29:19 EDT 2006 Wed Jul 5 21:29:20 EDT 2006 Wed Jul 5 21:29:21 EDT 2006 Wed Jul 5 21:29:22 EDT 2006 Wed Jul 5 21:29:23 EDT 2006 Wed Jul 5 21:29:24 EDT 2006 Wed Jul 5 21:29:25 EDT 2006 Wed Jul 5 21:29:26 EDT 2006 Wed Jul 5 21:29:27 EDT 2006
Shell aliases for process control
I spend most of my time in an xterm flipping around between programs, and it’s nice to be able to suspend and restart jobs quickly. On my workstation, I always have Emacs and a root-shell running as the first two jobs. Under the Z shell (Zsh), I use j
as an alias for jobs -dl
, so running j
produces the following:
[1] 92178 suspended sudo ksh
(pwd : ~)
[2] - 92188 suspended emacs
(pwd : ~)
[3] + 96064 suspended vi 003-shell-alias.mkd
(pwd : ~/notebook/2006/0618/newsforge-article)
This way, I get the process IDs (in case something gets wedged) plus the working directories for each process.
Zsh lets you bring a job to the foreground by typing a percent-sign followed by the job number. I hate typing two characters when one’s enough, so these aliases are convenient:
alias 1='%1'
alias 2='%2'
alias 3='%3'
alias 4='%4'
alias 5='%5'
alias 6='%6'
alias 7='%7'
alias 8='%8'
alias 9='%9'
alias z='suspend'
I can type 1
to become root, check something quickly, and then just type ‘z’ to become me again. Or I can type 2
to enter Emacs, and so forth. Note that this works in Bash as well.
Using PGP to create a password safe
How many different passwords do you have to remember, and how often do you have to change them? Lots of organizations seem to believe that high change frequency makes a password safe, even if the one you ultimately pick is only three characters long or your name spelled backwards.
PGP or the GNU Privacy Guard (GPG) can help you keep track of dozens of nice, long passwords, even if you have to change them weekly. There are several commercial packages which serve as password safes, but PGP is free, and all you need is a directory with one script to encrypt your password list and one to decrypt it.
The most important thing to remember: do not use the password for your safe for anything else!
I use GNU Privacy Guard for encryption, but any strong crypto will do. You can set up your own private/public key in just a few minutes by following the directions in the GNU Privacy Handbook. Let’s say you put your passwords in the file pw. Follow these steps to create a GPG public/private keypair and encrypt the password file:
First, generate a keypair by running gpg --gen-key
, and follow the prompts — suggested responses in bold:
gpg (GnuPG) 1.4.1; Copyright (C) 2005 Free Software Foundation, Inc. This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the file COPYING for details. Please select what kind of key you want: (1) DSA and Elgamal (default) (2) DSA (sign only) (5) RSA (sign only) Your selection? [hit return] DSA keypair will have 1024 bits. ELG-E keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) [hit return] Requested keysize is 2048 bits Please specify how long the key should be valid. 0 = key does not expire <n> = key expires in n days <n>w = key expires in n weeks <n>m = key expires in n months <n>y = key expires in n years Key is valid for? (0) [hit return] Key does not expire at all Is this correct? (y/N) y You need a user ID to identify your key; the software constructs the user ID from the Real Name, Comment and Email Address in this form: "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>" Real name: Your Name Email address: yourid@your.host.com Comment: You selected this USER-ID: "Your Name <yourid@your.host.com>" Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o You need a Passphrase to protect your secret key. [enter your passphrase]
Next, generate a revocation certificate (in case you forget your passphrase or your key’s been compromised) usinggpg --output revoke.asc --gen-revoke "Your Name"
. GPG will then walk you through a series of questions:
sec 1024D/B3D36900 2006-06-27 Your Name Create a revocation certificate for this key? (y/N) y Please select the reason for the revocation: 0 = No reason specified 1 = Key has been compromised 2 = Key is superseded 3 = Key is no longer used Q = Cancel (Probably you want to select 1 here) Your decision? 1 Enter an optional description; end it with an empty line: > Revoking my key just in case it gets lost > Reason for revocation: Key has been compromised Revoking my key just in case it gets lost Is this okay? (y/N) y You need a passphrase to unlock the secret key for user: "Your Name <yourid@your.host.com>" 1024-bit DSA key, ID B3D36900, created 2006-06-27 ASCII armored output forced. Revocation certificate created.
Your revocation key is now in the file revoke.asc. Store it on a medium which you can hide, like a USB key; otherwise someone can use it to render your key unusable.
Next, export your public key using gpg --armor --output public.gpg --export yourid@your.host.com
. This will store your public key in public.gpg, if you want to put it on your website or mail it.
Now, encrypt the pw file with gpg --armor --output pw.gpg --encrypt --recipient yourid@your.host.com pw
, which will encrypt the pw file as pw.gpg. To decrypt it, you must include your own key in the –recipient list.
Test decrypting the pw file with gpg --output testpw --decrypt pw.gpg
. You’ll be quizzed by GPG for the passphrase, like this:
You need a passphrase to unlock the secret key for user: "Your Name <yourid@your.host.com>" 2048-bit ELG-E key, ID 19DF3967, created 2006-06-27 (main key ID B3D36900) Enter passphrase: gpg: encrypted with 2048-bit ELG-E key, ID 19DF3967, created 2006-06-27 "Your Name <yourid@your.host.com>"
The file testpw should be identical to pw, or something’s wrong.
I use one script with two hard links for reading and updating passwords. When invoked as readp
, the script decrypts my password safe; If I have to modify the decrypted file, updatep
encrypts it.
1 #!/bin/ksh
2 # read or encrypt a file.
3 # use with GPG v1.4.1 or better.
4
5 PATH=/bin:/usr/bin:/usr/sbin:/usr/local/bin
6 export PATH
7 name=`basename $0`
8
9 case "$1" in
10 "") file="pw" ;;
11 *) file=$1 ;;
12 esac
13
14 # clear = plaintext file.
15 # enc = ascii-armor encrypted file.
16
17 case "$file" in
18 *.gpg) enc=$file
19 clear=`echo $file | sed -e 's/.gpg$//g'`
20 ;;
21
22 *) clear="$file"
23 enc="$file.gpg"
24 ;;
25 esac
26
27 case "$name" in
28 "readp")
29 if test -f "$enc"
30 then
31 gpg --output $clear --decrypt $enc
32 else
33 echo "encrypted file $enc not found"
34 fi
35 ;;
36
37 "updatep")
38 if test -f $clear
39 then
40 mv $enc $enc.old
41 gpg --armor --output $enc --encrypt
42 --recipient vogelke@pobox.com $clear && rm $clear
43 else
44 echo "cleartext file $clear not found"
45 fi
46 ;;
47 esac
48
49 exit 0
Mutt
Mutt is very useful for taking a quick look at a mailbox or correctly sending messages with attachments; I’ve seen people concatenate a few files together and pipe the results to mail, but there’s more to it than that. If you poke around Google for awhile, you can find many setups that make Mutt quite suitable for general mail-handling. Dave Pearson’s site, in particular, has some great configuration files.
Setting up a full-text index for code and documents
I started trying to index my files for fast lookup back when WAIS was all the rage; I also tried Glimpse and Swish-e, neither of which really did the trick for me.
The QDBM, Estraier, and Hyper-estraier programs are without a doubt the best full-text index and search programs I’ve ever used. They’re faster and less memory-intensive than any version of Swish, and the Hyper-estraier package includes an excellent CGI program which lets you do things like search for similar files.
Keeping a text version of my browser history
I know Mozilla and Firefox store your history for you, but it’s either for a limited time, or you end up with the history logfile from hell. If I have logfiles that are updated on the fly, I’d rather keep them relatively small.
Having command-line access to my browser links from any given day lets me search my browser history using the same interface as I use for my regular files (Estraier), as well as standard command-line tools. I keep my working files in dated folders, and I was recently looking for something I did in June on the same day that I looked up some outlining sites. Using locate, xargs, and grep I was able to find what I was looking for:
locate browser-history | xargs grep -i outlin
.../notebook/2006/0610/browser-history: 19:07:27 http://webservices.xml.com/pub/a/ws/2002/04/01/outlining.html
.../notebook/2006/0610/browser-history: 19:07:27 http://www.oreillynet.com/pub/a/webservices/2002/04/01/outlining.html
.../notebook/2006/0610/browser-history: 19:08:57 http://radio.weblogs.com/0001015/instantOutliner/daveWiner.opml
.../notebook/2006/0610/browser-history: 19:10:27 http://www.deadlybloodyserious.com/instantOutliner/garthKidd.opml
.../notebook/2006/0610/browser-history: 19:10:43 http://www.decafbad.com/deus_x/radio/instantOutliner/l.m.orchard.opml
.../notebook/2006/0610/browser-history: 19:10:47 http://radio.weblogs.com/0001000/instantOutliner/jakeSavin.opml
To get the text history, I use a Perl script by Jamie Zawinski which parses the Mozilla history file. According to Zawinski, the history format is “just about the stupidest file format I’ve ever seen,” and after trying to write my own parser for it, I agree.
The cron script below is run every night at 23:59 to store my browser history (minus some junk) in my notebook.
1 #!/bin/sh 2 # mozhist: save mozilla history for today 3 4 PATH=/bin:/usr/bin:/usr/local/bin:$HOME/bin 5 export PATH 6 umask 022 7 8 # your history file. 9 hfile="$HOME/.mozilla/$USER/nwh6n09i.slt/history.dat" 10 11 # sed script 12 sedscr=' 13 s//$// 14 /view.atdmt.com/d 15 /ad.doubleclick.net/d 16 /tv.yahoo.com/d 17 /adq.nextag.com/buyer/d 18 ' 19 20 # remove crap like trailing slashes, doubleclick ads, etc. 21 set X `date "+%Y %m %d"` 22 case "$#" in 23 4) yr=$2; mo=$3; da=$4 ;; 24 *) exit 1 ;; 25 esac 26 27 dest="$HOME/notebook/$yr/${mo}${da}" 28 test -d "$dest" || exit 2 29 30 exec mozilla-history $hfile | # get history... 31 sed -e "$sedscr" | # ... strip crap ... 32 sort -u | # ... remove duplicates ... 33 tailocal | # ... change date to ISO ... 34 grep "$yr-$mo-$da" | # ... look for today ... 35 cut -c12- | # ... zap the date ... 36 cut -f1,3 | # ... keep time and URL ... 37 expand -1 > $dest/browser-history # ... and store 38 39 exit 0
Jamie’s Perl script, mozilla-history, is line 30.
The tailocal utility (line 33) is a program written by Dan Bernstein which reads lines timestamped with the raw Unix date, and writes them with an ISO-formatted date, so echo 1151637537 howdy | tailocal
converts the timestamp to:
2006-06-29 23:18:57 howdy
If you don’t have tailocal, here’s a short Perl equivalent:
#!/usr/bin/perl
use POSIX qw(strftime);
while (<>) {
if (m/(d+)s(.*)/) {
print strftime("%Y-%m-%d %T ", localtime($1)), "$2n";
}
}
exit (0);
The resulting file has entries for one day which look like this:
15:55:27 http://mediacast.sun.com/share/bobn/SMF-migrate.pdf 16:02:36 http://www.sun.com/bigadmin/content/selfheal
Reading whitespace-delimited fields
At least once a day, I need the third or fourth column of words from either an existing file or the output of a program. It’s usually something simple like checking the output from ls -lt, weeding a few things out by eye, and then getting just the filenames for use elsewhere.
I use one script with nine hard links, f1
as the script, f2
through f9
as links. If I run the script using f1
, I get the first field, f2
shows the second field, etc.
The script is just a wrapper for awk:
1 #!/bin/sh
2 # print space-delimited fields.
3
4 PATH=/bin:/usr/bin; export PATH
5 tag=`basename $0`
6
7 case "$tag" in
8 f1) exec awk '{print $1}' ;;
9 f2) exec awk '{print $2}' ;;
10 f3) exec awk '{print $3}' ;;
11 f4) exec awk '{print $4}' ;;
12 f5) exec awk '{print $5}' ;;
13 f6) exec awk '{print $6}' ;;
14 f7) exec awk '{print $7}' ;;
15 f8) exec awk '{print $8}' ;;
16 f9) exec awk '{print $9}' ;;
17 *) ;;
18 esac
Using ifile for SPAM control
If you’re still plagued by spam, or you need a generic method of categorizing text files, have a look at ifile. It’s one of many “Bayesian mail filters”, but unlike bogofilter and spamassassin, it can do n-way filtering rather than simply spam vs. non-spam.
Karl is a Solaris/BSD system administrator at Wright-Patterson Air Force Base, Ohio. He graduated from Cornell University with a BS in Mechanical and Aerospace Engineering, and joined the Air Force in 1981. After spending a few years on DEC and IBM mainframes, he became a contractor and started using Berkeley Unix on a Pyramid system. He likes FreeBSD, trashy supermarket tabloids, Perl, cats, teen-angst TV shows, and movies.
Let us know about your most valuable utilities and how you use them. There need not be 10 of them, nor do they need to be in order, and if we publish your work, we’ll pay you $100.