PDF metadata removal using Java

First you need to differentiate since there are two types of metadata in the PDF:

  1. XMP meta data
  2. DID (document information dictionary, the old way)

The first you remove like the following:

PdfReader reader = stamper.getReader();
reader.getCatalog().remove(PdfName.METADATA);
reader.removeUnusedObjects();

The 2nd you remove like SANN3 has mentioned:

HashMap<String, String> info = super.reader.getInfo();
info.put("Title", null);
info.put("Author", null);
info.put("Subject", null);
info.put("Keywords", null);
info.put("Creator", null);
info.put("Producer", null;
info.put("CreationDate", null);
info.put("ModDate", null);
info.put("Trapped", null);
stamper.setMoreInfo(info);

If you then search the PDF with a text editor you won't find the /INFO dictionary nor XMP meta data...


Try this code

PdfReader readInputPDF = new PdfReader("sample.pdf");
HashMap<String, String> hMap = readInputPDF.getInfo();
PdfStamper stamper = new PdfStamper(readInputPDF, new FileOutputStream("sample1.pdf"));
hMap.put("Author", null);
stamper.setMoreInfo(hMap);
stamper.close();

Add the Metadata properties to the map which you want to remove from the PDF.

Tags:

Pdf

Itext

Pdfbox