Author: Mark Alexander Bain
A pipe is a means by which the output from one process becomes the input to a second. In technical terms, the standard output (stout) of one command is sent to the standard input (stdin) of a second command. If you are not sure of the advantages this creates, then let’s look at a simple example.
In this example, we’ll send a directory listing to an email account.
ls -l ~ > ls.tmp
mail -s "Home directory listing" info@markbain-writer.tk < ls.tmp
This works well, but it’s rather cumbersome and requires the creation of an interim file. The use of a pipe allows a simpler command structure and needs no extra files:
ls -l ~ | mail -s "Home directory listing" info@markbain-writer.tk
You will notice that a pipe is defined by the | symbol — not an uppercase i or the number one, but a vertical bar.
Introducing the filter
A pipe can pass the standard output of one operation to the standard input of another, but a filter can modify the stream. A filter takes the standard input, does something useful with it, and then returns it as a standard output. Linux has a large number of filters. Some useful ones are the commands awk
, grep
, sed
, spell
, and wc
.
If we look back at the our pipe example from above, we see that it gives an output something like:
drwxr-xr-x 4 bainm users 4096 2005-06-05 16:31 Desktop/ drwxr-xr-x 5 bainm users 4096 2004-11-15 00:00 GNUstep/ drwx------ 11 bainm users 4096 2005-06-04 18:02 Mail/ -rw-r--r-- 1 bainm users 10240 2005-01-06 20:36 New_database.kexi drwxr-xr-x 5 bainm users 4096 2005-05-27 12:53 OpenOffice.org1.1.2/ -rwxr-xr-x 1 bainm users 548788 2004-10-20 19:45 Project1* drwxr-xr-x 3 bainm users 4096 2004-10-18 10:52 Projects/ -rw-r--r-- 1 bainm users 4242 2004-10-20 19:45 Unit1.dcu drwxr-xr-x 3 bainm users 4096 2005-05-24 11:59 XamXpm/ drwxr-xr-x 11 bainm users 4096 2005-06-03 10:26 articles/ drwxr-xr-x 2 bainm users 4096 2005-05-30 15:09 backup/
Let’s say that in our email we require only files (not directories) sorted by the largest first and showing only the file name, owner, date last modified, and file size (in that order). To do this, we can use three of the Linux filters: awk
(to format), grep
(to remove the unwanted lines) and sort
(to get the lines in the correct order). In between each filter, we can use a pipe to pass on the result from the individual operations.
The first filter (grep
) removes any directories from the list by excluding any lines that start with a leading “d”:
grep -v "^d"
The next filter (awk
) extracts the required fields (file name, user name, access date and time, and file size). It also places the file size at the start line so that the data is ready for sorting:
awk '{print $5, $8, $3, $6, $7}'
Obviously, the next filter sorts the data:
sort -nr
And the final filter (another awk
) formats the data ready to be emailed:
awk '{print $2 "t" $3 "t" $4, $5 "t" $1}'
Finally, all we have to do is join the filters together with pipes:
ls -l ~ |
grep -v "^d" |
awk '{print $5, $8, $3, $6, $7}' |
sort -nr |
awk '{print $2 "t" $3 "t" $4, $5 "t" $1}' |
mail -s "File List" info@markbain-writer.tk
}
The result is something like:
backup.zip bainm 2005-05-30 13:03 1139563 Project1* bainm 2004-10-20 19:45 548788 Delphi_job_spec.rtf bainm 2004-10-14 13:37 217524 output.ps bainm 2004-12-01 21:22 166465 print.pdf bainm 2005-03-06 20:50 47266 kstars.png bainm 2005-03-05 17:35 20586 driving.htm bainm 2004-11-04 21:46 14977 comp.htm* root 2004-08-05 18:29 11101 New_database.kexi bainm 2005-01-06 20:36 10240 projections.sxc bainm 2004-12-21 13:33 7597 testhtml.sxw bainm 2005-01-06 11:33 5529
The pipes and filters allow us to create an elegant piece of scripting. Now, instead of five individual commands, we have a single, flowing process.
Some useful filters
There are many Linux commands that are filters, in addition to awk
, grep
, and sort
. Two filters to consider are tr
(translate) and sed
(stream edit). Both commands allow you to modify the stream — tr
for simple changes and sed
for the more complex. For example, you can use tr [a-z] [A-Z]
to convert everything to uppercase, or sed s/"*"//g
to remove the stars from the names of executable files.
Another filter to consider is tee
, which enables you to split a stream between stdout and a file. For example:
ls -l | tee file.lst | wc -l
This will create a file (file.lst) containing the result from ls -l
and will display the number of files to the screen (or pass it on to another filter, if you require).
Creating your own filters
So far, we have learned how to use pipes and simple filters together. The next step is to learn how to build a filter for a specific job. The above example will send a list of all the files in the home directory. However, let’s assume that we’re interested only in files that are greater than 10,000 bytes in size. We need to add in a new filter:
ls -l ~ | grep -v "^d" | awk '{print $5, $8, $3, $6, $7}' | only_big_files | sort -nr | awk '{print $2 "t" $3 "t" $4, $5 "t" $1}' | mail -s "File List" info@markbain-writer.tk }
The filter must first read the standard input. To do this, enclose any functionality within a “while read” loop. Any fields passed to the filter must be placed into variables:
while read SIZE FILE NAME DATE TIME do... done
Having read the standard input, we can now create the body of the filter. Here we simply check to see if the file is greater than 10,000. If it is, we send the data to the standard output. If not, we move onto the next line:
if [ $SIZE -gt 10000 ] then echo $SIZE $FILE $NAME $DATE $TIME fi
The completed filter is:
function only_big_files { while read SIZE FILE NAME DATE TIME do if [ $SIZE -gt 10000 ] then echo $SIZE $FILE $NAME $DATE $TIME fi done }
You could, of course, use the awk
filter to do the same:
awk '{if ($1>10000) {print $0}}'
Final thoughts
I find pipes and filters invaluable. Their uses range from simple processes (such as ls -l | more
) through to the highly complex. Like so many things in Linux, you’ll wonder how you ever managed to live without them.