What you need to know to write man pages

6294

Author: Peter Seebach

Despite the frequent attempts at competing formats, manual pages remain
the core of standard Unix documentation. Users often prefer manual pages
to other forms of documentation, and a well-written manual page is a valuable
addition to any open-source project. This article discusses the issues faced
in developing and writing manual pages, and a few quirks that you may
encounter along the way.

Why man pages?

Man pages, some assert, are obsolete. They’re old-fashioned,
they don’t have a lot of neat modern features, and they’re written in nroff, an old
and strange markup language.

All of this is true; nonetheless, man pages are the primary source of
documentation for Unix-like systems, and have been for a long time. The man program is installed as standard on
pretty much every Unix-like system ever shipped; competing formats may
require users to install and learn new, unfamiliar tools. Man
pages are well-suited to being printed without substantial effort; you
don’t have to format them, you can just feed them to a printer. Furthermore, everyone knows how to use more to read
files.

A user having trouble with a command is very likely to start with man
command
when looking for help. A man page that directs the user toward
the format-of-the-week is unhelpful. Even if there’s a compelling reason
to use another documentation format (such as better markup or support for hyperlinking) providing man pages is a good thing.

Nroff, troff, and macro packages

Man pages are written in a markup language, generally referred to
as nroff; in fact, there are other processors for it (such as troff or groff),
but it’s all the same language. Nroff is a markup language, but it’s a bit
more primitive than HTML or SGML. On the other hand, it’s a
macro language, so arbitrarily complex things can be done in it. There are
sets of macros designed to support certain document types. The flag given
to the traditional nroff program to specify a macro set is “-mNAME,” so many
names are chosen to look natural with this; for instance, old-style man pages
were written using the “tmac.an” set of macros, invoked with nroff -man.

There’s a newer set of macros called doc (tmac.doc) which come with groff,
and a strange superset called mandoc (tmac.andoc) found on 4.4BSD systems
which magically guesses whether you really want the old “man” macros or
the new “doc” macros. We’ll cover the older macro
set in passing, simply because it’s arcane and poorly documented, and it
is therefore most likely that you will suddenly be tasked with writing in it.
On systems using groff, there is usually a man page for these macros,
installed under the name groff_man. The mdoc macros are probably a lot cleaner,
but not everyone has them. Such is the price of progress.

In general, troff macros are introduced by starting a line with a period.
For instance, an old man page might start out like this:

	.TH C 1 local
	.SH NAME
	c - columnize input or files

This takes a bit of interpretation. The TH macro introduces a manual
page. TH probably stands for “title header.” It indicates the title
of the man page (“C”), the section (“1”), and up to three extra flags;
in this case, the “local” flag is used to indicate a local command, as
opposed to a standard part of the system. (As a trivia point, on a
sytem supporting the mandoc macro set, the TH macro may also load the “old”
macros from the traditional man macro set.) The manual section is
referenced elsewhere by referring to a command as name(section); for
instance, ls(1), or printf(3).

In the doc macro set, you see introductions like this:

	.Dd September 22, 2003
	.Dt LS 1
	.Os
	.Sh NAME
	.Nm ls
	.Nd list directory contents

Macro names still start with a period at the beginning of a line.
The Dd macro is the magic cookie the mandoc macros use to identify a file
formatted using the doc macros. Note that the doc macros have specific
sub-macros for the name and description of a command. In both macro sets,
the name of the man page, and its section, are introduced very early on.

Pages within sections

The online manual is divided into sections. The exact scope of
manual sections varies slightly from one system to another; a
typical layout is this:

	1	commands
	2	system calls
	3	library functions
	4	device drivers
	5	file formats
	6	games
	7	miscellaneous
	8	system utilities

There are often additional subsections; for instance, programs installed
in /usr/local might get man pages in section 1l. There may also be
additional sections; NetBSD documents kernel internals in section 9, for
instance.

Sections within a page

Confusingly, just as each manual page is in a “section” of the manual (e.g.,
section 1 for command-line utilities, or section 3 for library functions),
each manual page consists of several named sections.

The SH macro (Sh in the doc macro set) introduces a section header.
The exact set of section headers varies from one system to another.
A reasonably complete set is NAME, SYNOPSIS, DESCRIPTION, OPTIONS,
RETURN VALUE, ERRORS, DIAGNOSTICS, EXAMPLES, ENVIRONMENT, FILES,
CAVEATS, BUGS, RESTRICTIONS, NOTES, SEE ALSO, AUTHOR, and HISTORY.

The NAME section is the one used for the whatis database, and also for
the man -k or apropos commands. It should have a one-line summary of what the thing described is — concise, but informative. Made-up words
like “columnize” are probably bad style.

The SYNOPSIS section describes the usage of the command. In general, it
should look very much like a traditional usage message. Here’s how it might
look in source, with the old man macro set:

	.SH SYNOPSIS
	.B c
	.RB [ " -hV123456789 " ]
	.RB [ -w width ]
	.RB [ -c columns ]
	.RB [ -n spacing ]
	[
	.I "name &..."
	]

By convention, optional arguments are surrounded by square brackets. Options which
take arguments are traditionally separated out and given meaningful names.
The RB macro alternates between “roman” (plain) and bold styling. For
instance, this line:

.RB [ -n spacing ]

prints a square bracket in plain font, the text “-n spacing” in bold,
and then the closing bracket plain again. The backslash before the space
is used to make the while “-n spacing” into a single argument; it would
also work to write it this way:

.RB [ "-n spacing" ]

There is a related macro, BR, which also alternates between bold and roman,
starting with bold.

The doc macro set improves substantially on this:

	.Sh SYNOPSIS
	.Nm
	.Op Fl FINZ
	.Op Fl a Ar maxcontig
	.Op Fl B Ar byte-order
	.Op Fl b Ar block-size

The Nm macro prints the name of the command; it doesn’t need to be repeated
here, because an earlier macro saved it. The Op macro surrounds its arguments
in square brackets, the Fl macro indicates flags, and the Ar macro indicates a
named argument. This is a lot easier to write. The essential content remains
the same.

The DESCRIPTION section gives a brief summary of what the man page
describes; for instance, for a command or library function, it
would be a summary of functionality. For a data structure or file
format, it would be a summary of the structure of data and kind of
data stored. Some man pages include descriptions of options in
this section. Otherwise, they’re put in a separate section titled
OPTIONS, which should start with a clear summary of what the manual
page describes.

The formatting for options is itself a little weird. A typical OPTIONS
section might look a bit like this:

	.TP
	.B -h --horizontal
	Items go left to right, then top to bottom, rather than
	top to bottom, then left to right.  For instance, the 2nd
	item will be at the beginning of the 2nd column, rather
	than being the 2nd item in the 1st column.
	.TP
	.B -V --version
	Print version info and exit.

The TP macro introduces an indented paragraph with a label, suitable
for option lists. The B macro puts its arguments in bold print. With the doc
macros, it’s formatted like this:

	.Bl -tag -width indent
	.It Fl A
	List all entries except for
	.Ql &.
	and
	.Ql &.. .
	Always set for the super-user.
	.It Fl a
	Include directory entries whose names begin with a
	dot
	.Pq Sq &. .
	.El

The list is introduced with a Bl macro, and ended with El. The It macro
introduces an item, and the Fl macro introduces a flag, just as it did in the
SYNOPSIS section.

Every option should be described. Avoid the mistake of not telling someone
what the option really does; give some idea of when an option is useful and
what its effects are. Merely saying that the “-f” option toggles the “foo”
setting doesn’t help the reader.

Functions and command-line utilities should generally describe their return
values or exit statuses; this is what the RETURN VALUE section is used for.
This section is often omitted for command-line utilities that return 0 on
success. Some man pages merge this into the DIAGNOSTICS section.

ERRORS and DIAGNOSTICS should describe any possible error indications
a program or function can yield. By convention, programs have diagnostics,
but functions have errors. Error messages should be explained in
reasonable detail. For library functions or system calls, return codes
should be described, and so should any changes that may be made to errno.

The ENVIRONMENT section should describe any way in which environment variables
affect the behavior of a program. For instance, does it care about PATH, or
TMPDIR? Many GNU utilities, for instance, follow the POSIX specification
completely only when the environment variable POSIXLY_CORRECT has been set.
This is where such behaviors should be documented.

Similarly, the FILES section should describe any files a program or function
interacts with, especially any that are likely to be modified.

The SEE ALSO section should cross-reference other man pages that may be
relevant to a user reading this man page. For instance, on NetBSD,
the man page for ls(1) has SEE ALSO references for:

chflags(1), chmod(1), stat(2), getbsize(3), dir(5), symlink(7), sticky(8)

BUGS should include, not just crashing problems, but general limitations.
For instance, the “c” utility described assumes an 80-column screen; while
it tries to get the right value, it may fail, and this is a limitation,
so it’s documented in BUGS. Related would be RESTRICTIONS, which are, to
quote the pod2man man page, “bugs you don’t plan to fix.” Also related
are CAVEATS, sometimes called WARNINGS, which are things to watch out for
in how the program is designed, but which may not be what a user wants.

If present, a HISTORY section should describe where a command comes from;
for instance, what Unix-like system, and what version, it first appeared in.
The AUTHOR section should be used to identify the author or authors of the command.

The STANDARDS section should describe what standards, if any, something
complies with. For instance, the manual page for printf(3) should say
which version of the C standard the printf routine is compliant with. This
section should also indicate whether any functionality provided is an
extension to such standards; users may wish to avoid extensions when writing
portable programs.

Most man pages benefit a lot from an EXAMPLES section. Whatever
you’re documenting, show a couple of sample usages. For programs with
many options, show how some common ones interact. [Editor’s note: This is a huge pet peeve of mine. Too few man pages provide any examples at all. Man writers, take this advice to heart!]

Finally, if you really have to say something but it doesn’t fit anywhere
else, you can make a section called NOTES.

Weird macros

Some of the macros used are a bit confusing, or may have unusual limitations.
For instance, on some systems, the RB and BR macros may take a maximum
of 6 arguments. The same may apply to the BI, IB, IR, and RI macros (which
alternate bold or roman text with italicized text). This can require special
considerations when writing descriptions of C functions which take a number
of arguments. One convention used in a lot of man pages is to have the
function name and argument types in bold, and argument names in italics.
This can result in needing to split a line up. For instance:

	.BI "int szncmp(void *" "s1" ", void *" "s2"c
	.BI ", size_t " "len" );

The “c” at the end of the first line tells the macro processor that the
newline should not be treated as introducing whitespace between the last
argument on the first line, and the first argument on the second line.

There are additional macros for some systems. For instance,
on old SunOS systems, there’s an IX macro, used something like this:

.IX "mem2sz()" "" "makes sz from mem"

This was used in section 3 man pages to help populate an index; it doesn’t
appear to have any effect in current man page systems. The pod2man utility
generates these for section headings.

In addition to the font-selection macros, there are things you can do within
a line. For instance, these two lines look the same:

	.RB [ -s step ]
	[ fB-s stepfR ]

The most likely ones to use are fB (bold), fI (italic), and fR (roman, or
plain).

Alternatives to nroff

It’s pretty easy to imagine not wanting to write in nroff, especially not
in the archaic man macro set. Some people swear off producing man
pages at all. This is very annoying — don’t do it. Here are two
alternatives to consider:

1. The doc macro set used with groff is widely available, and fairly
friendly.
2. Perl’s POD documentation format converts reasonably well to manual pages.

Even if you’re stuck writing in the old man macro set, it’s still quite
possible to produce good, readable, solid documentation. Don’t be afraid
to copy some bits from existing manuals!

Many open source developers seem a little shy about documentation.
Documentation can be a bit hard to write, and harder to write well. Putting
out documentation sometimes seems like a way to ask people to waste your time
with typo reports. It’s still worth it. Remember that bug reports for
well-documented code will avoid the things you put in the CAVEATS section;
furthermore, users will understand what you thought this widget did in the
first place.

Don’t think of documentation as time taken away from developing a product;
think of it as time spent figuring out what exactly you’re developing.
Documentation is as much part of the final product as anything else; without
documentation, a product is inaccessible to users.