Modifying PDF files

I use pdftk mainly. But here are some others to consider:

pdfsam (PDF Split and Merge): "pdfsam is an open source tool (GPL license) designed to handle pdf files"

PDFJam "A small collection of shell scripts which provide a simple interface to much of the functionality of the excellent pdfpages PDF file package (by Andreas Matthias) for pdfLaTeX." (You can also use pdfLaTeX directly.)

jPDFTweak: "jPDF Tweak is a Java Swing application that can combine, split, rotate, reorder, watermark, encrypt, sign, and otherwise tweak PDF files."

Inkscape: is a vector graphics editor that can both import PDF pages into its native SVG format, and also export as PDF.

Calibre: Open source ebook management software that can convert PDFs to other formats, and manipulate them in other ways. Comes with command line tools such as pdfmanipulate which can be useful.

Ghostscript of course can do a lot of things with PDF files too.


I know two programs for manipulating PDFs under Linux:

PDEedit "PDFedit is a free open source pdf editor and a library for manipulating PDF documents, released under terms of GNU GPL version 2. It includes PDF manipulating library based on xpdf, GUI, set of command line tools and a pdf editor."

and pdftk "If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents."


LaTeX with the pdfpages and bookmark packages can do most of these things.

This works by creating a new tex document and including the original PDF documents (or parts of it) via \includepdf (see the pdfpages manual).

You can also change the page numbering, e.g.

\pagenumbering{roman}
\setcounter{page}{1}
% include pdf pages that should have roman numbering (the front matter)
\pagenumbering{arabic} % switch to arabic numbering
\setcounter{page}{1} % reset page counter
% include pdf pages that should have arabic numbering (the main matter)

These "logical" page numbers are merely labels that most PDF readers can use to navigate to a particular page. There's still also the underlying "physical" page numbers running consecutively from 1, for lower-level interactions (see below).

After you've included the existing pdf pages with the correct logical page numbering, you can set PDF bookmarks ("outlines") using the \bookmark command. The basic syntax is

\bookmark[page=<pagenumber>,level=<level>]{<title>}

where <pagenumber> is the page number of the target page. Note that these are not the "logical" page numbers defined earlier, but the internal page numbers running consecutively from 1, from the beginning of the pdf. Nesting bookmarks is done through specifying <level>, where 0 is the top level. The <title> is the text to be shown in the PDF reader's outline. See the bookmark manual for details.

Compiling the tex file will generate a new PDF with the desired page numbers and bookmarks.

For a complete example of how all of this comes together, see https://michaelgoerz.net/notes/pdf-bookmarks-with-latex.html