Convert HTML to image

Software Requirements

The following software packages are available for both Windows and Linux systems, and are required for a complete, working solution:

  • gvim - Used to export syntax highlighted source code to HTML.
  • moria - Colour scheme for syntax highlighting.
  • wkhtmltoimage - Used to convert HTML documents to PNG files.
  • gawk and sed - Text processing tools.
  • ImageMagick - Used to trim the PNG and add a border.

General Steps

Here is how the solution works:

  1. Load the source code into an editor that can add splashes of colour.
  2. Export the source code as an HTML document (with embedded FONT tags).
  3. Strip the background attribute from the HTML document (to allow transparency).
  4. Convert the HTML document to a PNG file.
  5. Trim the PNG border.
  6. Add a small, 25 pixel border around the image.
  7. Delete temporary files.

The script generates images that are all the same width for source files containing lines that are all under 80 characters in length. Source files with lines over 80 characters long result in images as wide as necessary to retain the entire line.

Installation

Install the components into the following locations:

  • gvim - C:\Program Files\Vim
  • moria - C:\Program Files\Vim\vim73\colors
  • wkhtmltoimage - C:\Program Files\wkhtml
  • ImageMagick - C:\Program Files\ImageMagick
  • Gawk and Sed - C:\Program Files\GnuWin32

Note: ImageMagick has a program called convert.exe, which cannot supersede the Windows convert command. Because of this, the full path to convert.exe must be hard-coded in the batch file (as opposed to adding ImageMagick to the PATH).

Environment Variables

Set the PATH environment variable to:

"C:\Program Files\Vim\vim73";"C:\Program Files\wkhtml";"C:\Program Files\GnuWin32\bin"

Batch File

Run it using:

src2png.bat src2png.bat

Create a batch file called src2png.bat by copying the following contents:

@ECHO OFF

SET NUMBERS=-c "set number"
IF "%2" == "" SET NUMBERS=

ECHO Converting %1 to %1.html...
gvim -e %1 -c "set nobackup" %NUMBERS% -c ":colorscheme moria" ^
  -c :TOhtml -c wq -c :q

REM Remove all background-color occurrences (without being self-referential)
sed -i "s/background-color: #......; \(.*\)}$/\1 }/g" %1.html

ECHO Converting %1.html to %1.png...
wkhtmltoimage --format png --transparent --minimum-font-size 80 ^
  --quality 100 --width 3600 ^
  %1.html %1.png

move %1.png %1.orig.png

REM If the text file has lines that exceed 80 characters, don't crop the
REM resulting image. (The book automatically shrinks large images to fit.)
REM The 3950 is the 80 point font at 80 characters with padding for line
REM numbers.
SET LENGTH=0
FOR /F %%l IN ('gawk ^
  "BEGIN {x=0} {if( length($0)>x ) x=length()} END {print x;}" %1') ^
DO (
  SET LENGTH=%%l
)
SET EXTENT=-extent 3950x
IF %LENGTH% GTR 80 SET EXTENT=

REM Trim the image height, then extend the width for 80 columns, if needed.
REM The result is that all images will be resized the same amount, thus
REM making the font size the same maximum for all source listings. Source
REM files beyond the 80 character limit will be scaled as necessary.
ECHO Trimming %1.png...
"C:\programs\ImageMagick\convert.exe" -format png %1.orig.png ^
  -density 150x150 ^
  -background none -antialias -trim +repage ^
  %EXTENT% ^
  -bordercolor none -border 25 ^
  %1.png

ECHO Removing old files...
IF EXIST %1.orig.png DEL /q %1.orig.png
IF EXIST %1.html DEL /q %1.html
IF EXIST sed*. DEL /q sed*.

Improvements and optimizations welcome.

Note: The latest version of wkhtmltoimage properly handles overriding the background colour. Thus the line to remove the CSS for background colours is no longer necessary, in theory.


reading the manpage of wkhtmltoimage:

 -d,    --dpi   <dpi>   Change the dpi explicitly

if that does not help: hacking together a simple solution with Qt and (the included) Webkit is pretty straightforward.