How to identify the format of images in a pdf?

AFAIK, the Image XObjects embedded inside PDFs do not store any information about the original image format. At most if it's an embedded JPEG it can be extracted as-is, but for all other cases you end up with a PxM image that you'll need to convert.


The picture is in portable pixmap file format. (See Wikipedia: Netpbm format for details).

The can use the netbmp tools to convert these to a more modern bmp.
The syntax for that is: ppmtobmp images-000.ppm > images-000.bmp.

http://netpbm.sourceforge.net/ is the homepage for netpbm.

Are there multiple images in a document? Or can we just search the PDF for the line with identify images-000.ppm, cut the file from that location and feed it to ppmtobmp? It should not be hard to automate that.