Fast pdf to jpg conversion on Linux wanted

Solution 1:

Using Ghostscript directly (instead of using ImageMagick's convert command, which calls Ghostscript indirectly) is indeed faster. And it gives you more control about conversion parameters. Try

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d.jpg \
   -dJPEGQ=95      \
   -r600x600       \
   -g4960x7016     \
   foo.pdf

where

  • -o: determines output path+filename (and saves usage of -dBATCH -dNOPAUSE)
  • -dJPEGQ: sets JPEG quality to 95%
  • -r: sets resolution to 600dpi
  • -g: sets image size to 4960x7016px
  • -sDEVICE: sets output as JPEG

This command will probably be still to slow for you and create files bigger than expected. For smaller filesizes and faster execution try this (which probably comes close to output quality of your convert commandline):

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d_200dpi_q80.jpg \
   -dJPEGQ=80      \
   -r200x200       \
   -g1653x2339     \
   foo.pdf

or even

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d_default_a4.jpg \
   -sPAPERSIZE=a4 \
   foo.pdf

(which gives 72dpi resolution, often good enough for most screens and for most web applications).

Solution 2:

BTW, one of the reasons ImageMagick is so much slower is that it calls Ghostscript twice. It does not convert PDF => PNG in one go, but uses 2 different steps:

  • it first uses Ghostscript for PDF => PostScript conversion;
  • it then uses Ghostscript for PostScript => PNG conversion.

You can learn about the detailed settings ImageMagick's "delegates" (the external programs ImageMagick uses, such as Ghostscript) by typing

convert -list delegate

(On my system that's a list of 32 different commands.) Now to see which commands are used to convert to PNG, use this:

convert -list delegate | grep -i png

Ok, this was for Linux. If you are on Windows, try this:

convert -list delegate | findstr /i png

You'll discover that IM does produce PNG only from PS or EPS input. So how does IM get (E)PS from your PDF? Easy:

convert -list delegate | findstr /i PDF
convert -list delegate | grep -i PDF

Ah! It uses Ghostscript to make a PDF => PS conversion, then uses Ghostscript again to make a PS => PNG conversion. Works, but isn't the most efficient way if you know that Ghostscript can do PDF => PNG in one go. And faster. And in much better quality.

About IM's handling of PDF conversion to images via the Ghostscript delegate you should know two things first and foremost:

  1. By default, if you don't give an extra parameter, Ghostscript will output images with a 72dpi resolution. That's why sometimes people here suggest to add -density 600 as a convert parameter which tells Ghostscript to use a 600 dpi resolution for its image output.
  2. The detour of IM to call Ghostscript twice to convert first PDF => PS and then PS => PNG is a real blunder. Because you never win and harldy keep quality in the first step, but very often loose some. Reasons:
    • PDF can handle transparencies, which PostScript can not.
    • PDF can embed TrueType fonts, which PostScript can not. etc.pp.
      (Conversion in the opposite direction, PS => PDF, therefor is not that critical....)

That's why I'd suggested you convert your PDFs in one go to PNG (or JPEG) using Ghostscript directly. And use the most recent version 8.71 (soon to be released: 9.00) of Ghostscript...


Solution 3:

The program pdftoppm from the poppler package is also able to create JPEGs, and for me it is about twice as fast as using gs as described above:

pdftoppm -jpeg -r 300 foo.pdf foo.jpg

Solution 4:

In my experience, MuPDF is a lot faster than Ghostscript. It is a much newer project without much of the cruft in gs. Try if it fits for your usecase!

mudraw -w 1024 -h 768 -r 200 -c rgb -o bar%d.png foo.pdf

If you have a older linux distribution and installed mupdf-tools from the repository, mudraw might still be called pdfdraw

You then have to convert the png to jpeg using for example imagemagick. But it will still be faster than Ghostscript.