How can I rasterize all of the text in a PDF?

You could test out if image based PDF's are polluted as well. First convert PDF to (multipage) TIFF, e.g. with ghostscript:

gs -sDEVICE=tiffg4 -o sample.tif sample.pdf

Then convert the TIFF to PDF, e.g.:

tiff2pdf -z -f -F -pA4 -o sample-img.pdf sample.tif

This result in a PDF file where the pages are images instead of text.

Alternatively, if your system supports printing of TIFF files try to print it directly.

There is also the option of pdf2ps for converting PDF to PS, which if works, would likely be preferable.

Tags:

Linux

Pdf

Ocr

Pdftk