pdfbox: how to clone a page

The least resource intensive way to clone a page is a shallow copy of the corresponding dictionary:

PDDocument doc = PDDocument.load( file );

List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();

PDPage page = allPages.get(0);
COSDictionary pageDict = page.getCOSDictionary();
COSDictionary newPageDict = new COSDictionary(pageDict);

newPageDict.removeItem(COSName.ANNOTS);

PDPage newPage = new PDPage(newPageDict);
doc.addPage(newPage);

doc.save( outfile );

I explicitly deleted the annotations (form fields etc) of the copy because an annotation has a reference pointing back to its page which in the copied page obviously is wrong.

Thus, if you want the annotations to come along in a clean way, you have to create shallow copies of the annotations array and all contained annotation dictionaries, too, and replace the page reference therein.

Most PDF reader would not mind, though, if the page references are incorrect. For a dirty solution, therefore, you could simply leave the annotations in the page dictionary. But who wants to be dirty... ;)

If you want to additionally change some parts of the new or the old page, you obviously also have to copy the respective PDF objects before manipulating them.

Some other remarks:

Your original page cloning to me looks weird. After all you add the identical page dictionary to the document again (duplicate entries in the page tree are ignored, I think) and then do some merge between these identical page objects.

I assume the PDFCloneUtility is meant for cloning between different documents, not inside the same, but merging a dictionary into itself does not need to work.

I would like to get a reference to all the PDFields for any form fields in this newly cloned page

As the fields have the same name, they are identical!

Fields in PDF are abstract fields which can have many appearances spread over the document. The same name implies the same field.

A field appearing on some page means that there is an annotation representing that field on the page. To make things more complicated, field dictionary and annotation dictionary can be merged for fields with one appearance only.

Thus, depending on your requirements you will first have to decide whether you want to work with fields or with field annotations.

Tags:

Java

Pdfbox