"Print" webpage to pdf with working hyperlinks

First I have to assure you that it matters what browser you are using. Unless you are using the standard OS print dialog, the browser will use its own library to create the PDF. Results will vary between different browsers.

Just as experiment I have printed this page using Firefox and Chromium. Firefox did not save any clickable links. Chromium saved about 50% and the selection seemed to be pretty random.

I believe the best solution for you would be to install a browser add-on / extension that will do the job.

I made a quick search and for Firefox I have found this one: the unimaginatively named Print pages to Pdf. Direct link to the latest version: 0.5.0.6.

Creates one Pdf from any amount of open Browsertabs,Bookmarks/-folder, Scrapbook(Plus) pages. This document can be archived, sended [sic] by e-mail or printed out with any standard Pdf Viewer.

if you go through the list of features you will find what you are looking for:

  • Retains links in the pdf from the content of webpages
  • Supports local links for navigating in the webpage/pdf

I have tested it briefly and it printed the page correctly with all clickable links.


Using Chrome's "save as PDF" integrated virtual printer is another option. It worked for me where the "print pages to PDF" Firefox extension messed the page badly (but indeed preserved the hyperlinks)


wkhtmltopdf

Based on Print pages to Pdf, I suggest the stand alone tool wkhtmltopdf.

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf" "http://www.example.com" test.pdf

Pandoc

wkhtmltopdf didn't work in my case, therefore I recommend Pandoc. It is a bit more complicated to get it running. With small documents, you should be able to run

pandoc http://www.example.org/ -o test.pdf

When you have UTF-8 documents with Chinese characters, you have to do it like follows:

pandoc http://blog.fefe.de/ -o test.tex
lualatex test.tex
lualatex test.tex

work-in-progress

However, in my concrete setting pandoc http://www.w3.org/TR/DOM-Parsing/ -o test.pdf, it lead to a LaTeX error.

! LaTeX Error: Too deeply nested.

Therefore, I did create the latex file manually:

pandoc http://www.w3.org/TR/DOM-Parsing/ --standalone -o test.tex

Then, I had to disable line 78, because an \includegraphics was within \href.

With a hack suggested at stackoverflow (inserted at line 74, right before \begin{document}), I also could not got it running (pdflatex test).

I opened issue #2438 at Pandoc.