How do I convert Linux man pages to HTML without using groff?

There are plenty of alternatives such as roffit, troff, man2html. There's also perl based online manpage browsers, such as manServer.

My favorite is pandoc, though sadly it doesn't seem to support ROFF input by default (though you can probably use it if you need to chain multiple transformation filters together.

man2html example:

zcat /usr/share/man/man1/dd.1.gz \ 
    | man2html \
    | sudo tee /var/www/html/dd.html

roffit example:

git clone git://github.com/bagder/roffit.git
cd roffit
zcat /usr/share/man/man1/dd.1.gz \
    | perl roffit \
    | sudo tee /var/www/html/dd-roffit.html

Other tools:

  • troffcvt does about the same thing.
  • The 'real' troff - Gonna try out http://heirloom.sourceforge.net/doctools.html. I suspect schily has OpenSolaris and friends in mind :-).

This first bit is a shameless rip from the official website:

mandoc is a suite of tools compiling mdoc, the roff macro language of choice for BSD manual pages, and man, the predominant historical language for UNIX manuals. It is small, ISO C, ISC-licensed, and quite fast. The main component of the toolset is the mandoc utility program, based on the libmandoc validating compiler, to format output for UNIX terminals (with support for wide-character locales), XHTML, HTML, PostScript, and PDF.

mandoc has predominantly been developed on OpenBSD and is both an OpenBSD and a BSD.lv project. We strive to support all interested free operating systems, in particular FreeBSD, NetBSD, DragonFly, illumos, Minix 3, and GNU/Linux, as well as all systems running the pkgsrc portable package build system. To support mandoc development, consider donating to the OpenBSD foundation.

pacman informs me my locally installed mdocml package-size is 3.28mb, and that it includes the following /usr/bin located binaries:

/usr/bin/demandoc
/usr/bin/makewhatis
/usr/bin/mandoc
/usr/bin/mapropos
/usr/bin/mman
/usr/bin/mwhatis

With it I can do:

mman -Thtml mman >/tmp/html
firefox file:///tmp/html

enter image description here

You can apply your own stylesheets as you like. All of the documentation is online, as well. And all of that, as I think, is compiled with mandoc as well.


Firstly, it should be noted that there is more than one program called man2html.

One utility called man2html is a C program originaly written in the late 1990's by Richard Verhoeven at the Eindhoven University of Technology in the late 1990's. The program has substantially quirky internals. However, it has the advantage that it works with the raw man page source, rather than troff or nroff output. This program was added to Frederico Lucifredi's man suite.

The program understands the semantics of the man and mandoc macros, and outputs a reasonable HTML structure. For instance when you use indented paragraphs, like this:

.IP word
Definition of
word.
.RS

the program will put out a HTML definition list.

I maintain one very large man page (most of a megabyte of source, and nearly 400 pages long, when converted to letter size PDF by groff):

$ ls -l txr.1
-rw-rw-r-- 1 kaz kaz 980549 Jan  3 11:38 txr.1

When I needed to convert this to HTML, some five years ago, the only thing I found which did a reasonable job was the man2html C program, plus post-processing of its output to "season to taste".

Eventually, I wanted a much better quality HTML document, so I started writing troff macros. The limitations of the C program became painfully apparent, so I forked it. On my git site, you can find a git repo with 30 patches to man2html. These patches fix a number of bugs, and enhance the program with a much improved ability to interpret troff macros, conditionals, loops and other constructs. I also added a M2 register by means of which you can write code which detects that it's running under man2html and can conditionally do some things differently (scroll down for an example). As well, I added a .M2SS command which lets you emit a custom HTML header section.

My large manpage is hosted here. This is produced with man2html, post-processed by my genman.txr program, which rearranges the sections, and adds hyper-links throughout the document. It also rewrites the internal links in the table of contents to be stable URLs (based on hashing rather than arbitrary enumeration) and makes the table of contents collapsible via some Javascript.

The exact commands used by my Makefile:

man2html txr.1 | ./txr genman.txr - > txr-manpage.html
tbl txr.1 | pdfroff -man --no-toc - > txr-manpage.pdf

For an example of how the output is conditionally different between HTML and nroff we can look at a section of the man output:

       9.19.4 Macro defstruct

       Syntax:

                (defstruct {<name> | (<name> <arg>*)} <super>
                   <slot-specifier>*)

              The  defstruct  macro defines a new structure type and registers
              it under <name>, which must be a bindable symbol,  according  to
              the  bindable  function. Likewise, the name of every <slot> must
              also be a bindable symbol.

Above, note how parameters are denoted in <angle> <brackets>. In the HTML version, they appear in italics.

The syntax section appears in the source code like this:

.coNP Macro @ defstruct
.synb
.mets (defstruct >> { name | >> ( name << arg *)} < super
.mets \ \  << slot-specifier *)
.syne

which is all custom macros defined in the same document. Under .mets, < b means b is a meta-syntactic variable. >> a b means a is a concrete syntax, next to which is the meta-syntactic b without any intervening space, and <> a b c means b is a meta-syntactic crunched between a and c literals.

My improved version of man2html understands the fairly complicated macro which implements these markup conventions.

Also, note how the manual has automatically numbered sections: that's all done by troff code, which man2html understands.