How do I save an image PDF file as an image?

Please pay close attention to pooryorick's answer, in which he points out how sleske's answer is actually a much better answer for this particular problem.


Use GhostScript. This command works for me:

gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r150 -sOutputFile=output%d.png input.pdf

There are multiple png pseudo-devices, differentiating on color depth: pngmono, pnggray, png16, png256, png16m, and pngalpha. Choose whichever one suits you the best.

You can also use jpeg, but unless you have a disk space issue, you want as high a quality as you can manage for your OCR, and that's not jpeg.

GhostScript no longer has support for gif, but I can't imagine why you'd need that, what with png256 support.


Install Imagemagick. Open a cmd window or terminal:

convert myfile.pdf myfile.jpg

The output will be 1 jpg file for each page in your pdf, test-0.jpg, test-1.jpg, etc.


There's also pdfimages from the Xpdf tools (available from the site of XpdfReader). It will not convert a whole PDF page to an image, rather it will extract embedded images from a PDF.

This is useful if the PDF contains text and images, and you want only the images. Also, it will extract the images in their original format, so no loss of quality is involved (unlike programs which render the whole page and then convert it to e.g. JPEG). Depending on your needs this might be useful.


Simple usage:

pdfimages -j -list mydocument.pdf mydocument-images

This will read the input file mydocument.pdf, extract all images and write them to individual files named mydocument-images-0000.jpg, mydocument-images-0001.jpg etc.

Option -j makes it write embedded JPEG-compressed images as JPEG files, not as PBM/PGM/PPM files (which are uncompressed and huge). Note that images may still be written as PBM/PGM/PPM files, if that's how they were stored in the PDF input file.

Tags:

Pdf

Images