How to view and edit the code of a PDF file

Regarding your 1st question ("viewing source code, but no binary"): there are a few options which you have in order to de-compress the internal binary streams which are attached to many objects.

My favorite tool for this is QPDF, available on all major OS platforms. The following command de-compresses all streams and all object streams:

 qpdf --qdf --object-streams=disable orig.pdf expanded.pdf

Now you can open your PDF in any text editor. (There may still be some binary blobs in there: for example, font files and ICC profiles, which wouldn't make sense for QPDF to expand).

To re-compress the expanded.pdf again after editing, you can run:

 qpdf expanded.pdf orig2.pdf

(Careful when manually editing PDFs! You need to know a lot about their internal syntax in order to do this right. As soon as you add or delete a single byte, you can get error messages from PDF readers who may no longer be able to open it, because the PDFs internal ToC is corrupted, which is based on byte-offset calculations. Just replacing Fit by XYZ strings should go fine, though...)


You can use sed with binary files (at least GNU sed; some implementations may have trouble with files containing null characters or not ending with a newline character). But the command you used only replaces the first occurrence of /Fit on each line, and lines are pretty much meaningless in a PDF file. You need to replace all occurrences:

 sed s/\/Fit/\/XYZ/g

It would be more robust only replace /Fit if it's not followed by a word constituent (e.g. not replacing /Fitness; I don't know if your file contains occurrences of /Fit that would cause trouble). Here's one way:

perl -pe 's!/Fit\b!/XYZ!g'

Tags:

Pdf