pdf to jpg without quality loss; gscan2pdf

It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:

convert -density 300 file.pdf page_%04d.jpg

(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you're using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try

pdfimages -j file.pdf page

You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.


convert doesn't work for me. This (pdftoppm) works perfectly, however. Each of the below commands will ensure an "images" directory exists, creating it if it doesn't, and store the generated images into that directory.

1200 DPI

mkdir -p images && pdftoppm -jpeg -r 1200 mypdf.pdf images/pg

600 DPI

mkdir -p images && pdftoppm -jpeg -r 600 mypdf.pdf images/pg

300 DPI (produces ~1MB-sized files per pg)

mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg

300 DPI with least compression/highest quality (produces ~2MB-sized files per pg)

mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg

Additional reading:

  1. https://stackoverflow.com/questions/43085889/how-to-convert-a-pdf-into-jpg-with-commandline-in-linux/61700520#61700520
  2. https://stackoverflow.com/questions/6605006/convert-pdf-to-image-with-high-resolution/58795684#58795684
  3. https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844#1187844

As student's answer said pdfimages is a good option. From my experience both gs and convert export to poor quality regardless if you specify the right dpi.

But if the pdf has multiple layers per page pdfimages doesn't work and extracts the layers as separate image, in that case best is to use inskcape to export the page as is seen.

This are the commands I use:

pdftk combined_to_do.pdf burst output pg_%04d.pdf
ls ./pg*.pdf | xargs -L1 -I {}  inkscape {} -z --export-dpi=300 --export-area-drawing --export-png={}.png

First command splits all pages second command converts page by page to png. You can keep them png or just convert them to jpeg

ls ./p*.png | xargs -L1 -I {} convert {}  -quality 100 -density 300 {}.jpg

Compared to pdfimages, gs, and ImageMagick's convert I find inkscape's export the best in quality.