How can I convert a Word document to PDF?

This is quite a hard task, ever harder if you want perfect results (impossible without using Word) as such the number of APIs that just do it all for you in pure Java and are open source is zero I believe (Update: I am wrong, see below).

Your basic options are as follows:

  1. Using JNI/a C# web service/etc script MS Office (only option for 100% perfect results)
  2. Using the available APIs script Open Office (90+% perfect)
  3. Use Apache POI & iText (very large job, will never be perfect).

Update - 2016-02-11 Here is a cut down copy of my blog post on this subject which outlines existing products that support Word-to-PDF in Java.

Converting Microsoft Office (Word, Excel) documents to PDFs in Java

Three products that I know of can render Office documents:

yeokm1/docs-to-pdf-converter Irregularly maintained, Pure Java, Open Source Ties together a number of libraries to perform the conversion.

xdocreport Actively developed, Pure Java, Open Source It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).

Snowbound Imaging SDK Closed Source, Pure Java Snowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.

OpenOffice API Open Source, Not Pure Java - Requires Open Office installed OpenOffice is a native Office suite which supports a Java API. This supports reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one. Or you can use the wrapper API JODConverter.

JDocToPdf - Dead as of 2016-02-11 Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.


Check out docs-to-pdf-converter on github. Its a lightweight solution designed specifically for converting documents to pdf.

Why?

I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.


Docx4j is open source and the best API for convert Docx to pdf without any alignment or font issue.

Maven Dependencies:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

Code:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

You can use JODConverter for this purpose. It can be used to convert documents between different office formats. such as:

  1. Microsoft Office to OpenDocument, and vice versa
  2. Any format to PDF
  3. And supports many more conversion as well
  4. It can also convert MS office 2007 documents to PDF as well with almost all formats

More details about it can be found here: http://www.artofsolving.com/opensource/jodconverter

Tags:

Java

Pdf

Ms Word