Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null

My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:

https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

106        } catch (GhostscriptException e) {
107            logger.error(e.getCause().toString(), e);
108        } finally {

In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.

(from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)

To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.


As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.