A better Wikipedia page to PDF tool?

Oh, this question comes exactly at the right time :-)

Because only last night I created myself a PDF from different Wikipedia input articles, using the wonderful prince utility. The command is still in my bash history:

time prince \
   --verbose \
   --no-author-style \
   --style=http://www.princexml.com/howcome/2008/wikipedia/wiki2.css \
     http://en.wikipedia.org/wiki/Color_management \
     http://en.wikipedia.org/wiki/Gamut \
     http://en.wikipedia.org/wiki/RGB \
     http://en.wikipedia.org/wiki/CMYK \
     http://en.wikipedia.org/wiki/Color_space \
     http://en.wikipedia.org/wiki/ICC_profile \
     http://en.wikipedia.org/wiki/Color_calibration\
     http://en.wikipedia.org/wiki/Linux_color_management \
   --output=prince-colormanagement-wikipedia.pdf

It took only 3 minutes to download all the required files (it uses a remote CSS stylesheet file, and also freely available fonts (Gentium) which it fetches online...) and compose the PDF.

Prince can be downloaded and used without registration. However, it is still a commercial software: it may be used without fee for non-commercial purposes -- but in this case you'll have to live with a small, rather unobtrusive logo on the first page of your PDF.

This is the little Prince logo you'll have to live with when you don't pay for the software, appearing in the upper right corner on the first page of your PDFs:

Upper left corner of first PDF page shows little 'Prince' logo...

Here is a screenshot of the resulting PDF displaying a page and the partially expanded bookmarks pane in Acrobat Reader on Linux:

Screenshot of prince-generated PDF from multiple Wikipedia articles with bookmarks

As you can see, all the bookmarks for all contained articles are there.

As you'll also notice, the stylesheet's 2-column typesetting was correctly implemented by prince. (If you are a CSS guru, you can easily create your own stylesheet, using your own font preferences etc. to create Wikipedia book styles to your liking.)

Prince is available not only for Linux, but also for Windows, Solaris and Mac OS X.


Update: Just to compare with the features you wanted:

  1. "Change the hyperlinks to bookmarks inside the article for multipage books if the destination page is also in the book."
    • This tool does exactly do what you want here.
  2. "Better contents layer."
    • To be honest, I do not understand what you mean with this point. However, since you could hack your own stylesheet and have Prince apply that to the output, there are no limits to 'better content' for you.
  3. "Automatically update pages, if possible..."
    • You'll have to write your own script based on the prince commandline to do that. It would use a cronjob to check if any of the wikipedia articles making up your book has changed. If so, run the prince command again. The cronjob checking for modifications could use curl and the wikipedia API in order to query for the last changed date of an article.

Tags:

Pdf