How can I split each PDF page into two pages, using the command line?

This should work it needs pdftk tool ( and ghostscript ).

A simple case:

Step One: Split into individual pages

 pdftk clpdf.pdf burst

this produces files pg_0001.pdf, pg_0002.pdf, ... pg_NNNN.pdf, one for each page. It also produces doc_data.txt which contains page dimensions.

Step Two: Create left and right half pages

  pw=`cat doc_data.txt  | grep PageMediaDimensions | head -1 | awk '{print $2}'`
  ph=`cat doc_data.txt  | grep PageMediaDimensions | head -1 | awk '{print $3}'`
  w2=$(( pw / 2 ))
  w2px=$(( w2*10 ))
  hpx=$((  ph*10 ))
  for f in  pg_[0-9]*.pdf ; do
   lf=left_$f
   rf=right_$f
   gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
   gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${w2} 0]>> setpagedevice" -f ${f}
  done

Step Three: Merge left and right in order to produce newfile.pdf containing single page .pdf.

  ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
  pdftk `cat fl`  cat output newfile.pdf 

A more general case:

  1. The example above assumes all pages are same size. The doc_data.txt file contains size for each split page. If the command

    grep PageMediaDimensions <doc_data.txt | sort | uniq | wc -l

    does not return 1 then the pages have different dimensions and some extra logic is needed in Step Two.

  2. If the split is not exactly 50:50 then a better formula than w2=$(( pw / 2 )), used in the example above, is needed.

This second example shows how to handle this more general case.

Step One: split with pdftk as before

Step Two: Now create three files that contain the width and height of each pages and a default for the fraction of the split the left page will use.

  grep PageMediaDimensions <doc_data.txt | awk '{print $2}'    >   pws.txt
  grep PageMediaDimensions <doc_data.txt | awk '{print $3}'    > phs.txt
  grep PageMediaDimensions <doc_data.txt | awk '{print "0.5"}' > lfrac.txt

the file lfrac.txt can be hand edited if information is available for where to split different pages.

Step Three: Now create left and right split pages, using the different pages sizes and (if edited) different fractional locations for the split.

#!/bin/bash
exec 3<pws.txt
exec 4<phs.txt
exec 5<lfrac.txt

for f in  pg_[0-9]*.pdf ; do
 read <&3 pwloc
 read <&4 phloc
 read <&5 lfr
 wl=`echo "($lfr)"'*'"$pwloc" | bc -l`;wl=`printf "%0.f" $wl`
 wr=$(( pwloc - wl ))
 lf=left_$f
 rf=right_$f
 hpx=$((  phloc*10 ))
 w2px=$(( wl*10 ))
 gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
 w2px=$(( wr*10 ))
 gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${wl} 0]>> setpagedevice" -f ${f}
done

Step Four: This is the same merge step as in the previous, simpler, example.

  ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
  pdftk `cat fl`  cat output newfile.pdf 

You can widen your choice of tools by converting the pdf to PostScript as follows, then using pstops. I've assumed we start from an A4 portrait page showing two pages as they might have been scanned from an open book, with the spine going horizontally through the middle, like this:

original

Obviously, you can change the values in the solution below to fit your precise case.

You can convert this pdf to PostScript with pdf2ps (which is part of the ghostscript package). Then tool pstops from package psutils, can be used to rotate the page right (clockwise) around the bottom left corner, rescale it and move the result up so that only the bottom half covers a whole page:

one page

A second page can be created from the same original page by a similar rotation, scale, and translation. The result can be converted back to pdf. A single command can draw each page onto 2 new pages:

pdf2ps myfile.pdf out.ps
pstops -p a4 '[email protected](1cm,29cm),[email protected](-16cm,29cm)' out.ps new.ps
ps2pdf new.ps new.pdf

The syntax is explained in the man page. Here we have R for rotate right, @1.2 to scale, (x,y) to move the result. The comma (,) produces 2 pages from each original page.

Note that this will double the size of the resulting pdf, since each page is fully drawn twice, even though you only see half of it each time.


You want Libpoppler, or more precisely the pdfimages tool therein. It is free software, will extract the images from the PDF. If the PDF contains scanned images, they are not always oriented correctly, off by a few degrees. If the page contains two images, one for each scanned page, it becomes easy ... if not, you will have to cut them manually (dirty) or try ImageMagick to split them.

http://poppler.freedesktop.org/

http://en.wikipedia.org/wiki/Pdfimages

Taken from stackoverflow.