Chop pages of a PDFs into multiple pages

You can solve this with the help of Ghostscript. pdftk alone cannot do that (to the best of my knowledge). I'll give you the commandline steps to do this manually. It will be easy to script this as a procedure, also with different parameters for page sizes and page numbers. But you said that you can do that yourself ;-)

How to solve this with the help of Ghostscript...

...and for the fun of it, I've recently done it not with an input file featuring "double-up" pages, but one with "treble-ups". You can read the answer for this case here.

Your case is even simpler. You seem to have something similar to this:

+------------+------------+   ^
|            |            |   |
|      1     |      2     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
             ^
            fold
             v
+------------+------------+   ^
|            |            |   |
|      3     |      4     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
<---------- 842 pt -------->

You want to create 1 PDF with 4 pages, each of which has the size of 421 pt x 595 pt.

First Step

Let's first extract the left sections from each of the input pages:

gs \
    -o left-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [0 0]>> setpagedevice" \
    -f double-page-input.pdf

What did these parameters do?

First, know that in PDF 1 inch == 72 points. Then the rest is:

  • -o ...............: Names output file. Implicitely also uses -dBATCH -dNOPAUSE -dSAFER.
  • -sDEVICE=pdfwrite : we want PDF as output format.
  • -g................: sets output media size in pixels. pdfwrite's default resolution is 720 dpi. Hence multiply by 10 to get a match for PageOffset.
  • -c "..............: asks Ghostscript to process the given PostScript code snippet just before the main input file (which needs to follow with -f).
  • <</PageOffset ....: sets shifting of page image on the medium. (Of course, for left pages the shift by [0 0] has no real effect.)
  • -f ...............: process this input file.

Which result did the last command achieve?

This one:

Output file: left-sections.pdf, page 1
+------------+  ^
|            |  |
|     1      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: left-sections.pdf, page 2
+------------+  ^
|            |  |
|     3      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Second Step

Next, the right sections:

gs \
    -o right-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [-421 0]>> setpagedevice" \
    -f double-page-input.pdf

Note the negative offset since we are shifting the page to the left while keeping the viewing area stationary.

Result:

Output file: right-sections.pdf, page 1
+------------+  ^
|            |  |
|     2      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: right-sections.pdf, page 2
+------------+  ^
|            |  |
|     4      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Last Step

Now we combine the pages into one file. We could do that with ghostscript as well, but we'll use pdftk instead, because it's faster for this job:

pdftk \
  A=right-sections.pdf \
  B=left-sections.pdf \
  shuffle \
  output single-pages-output.pdf
  verbose

Done. Here is the desired result. 4 different pages, sized 421x595 pt.

Result:

+------------+ +------------+ +------------+ +------------+   ^
|            | |            | |            | |            |   |
|     1      | |     2      | |     3      | |     4      |   |
|            | |            | |            | |            |5595 pt
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
+------------+ +------------+ +------------+ +------------+   v
<-- 421 pt --> <-- 421 pt --> <-- 421 pt --> <-- 421 pt -->

There is a tool pdfposter which can be used to create PDFs with several pages for one input page (tiling or chopping the pages). It is similar to the tool poster, which does the same for PostScript files.


So, after a lot more searching (it seems that "PDF cut pages" is a far better search), I found a little script called unpnup which uses poster, PDF/PS conversion, and pdftk to do exactly what I need. It's a bit of a long way around, but it's far superior to the other methods I found (such as using imagemagick) because it doesn't rasterise the pages before spitting them out.

Just in case mobileread goes away for some reason, the core of the script (licenced under the GPLv2 or later by Harald Hackenberg <hackenberggmx.at>) is as follows:

pdftk "$1" burst
for file in pg*.pdf;
do
    pdftops -eps $file
    poster -v -pA4 -mA5 -c0% `basename $file .pdf`.eps > `basename $file .pdf`.tps
    epstopdf `basename $file .pdf`.tps
done
pdftk pg*.pdf cat output ../`basename $1 .pdf`_unpnuped.pdf