Extract OCR text from Evernote

Evernote API has functionality to get text and rectangle where this text is present inside the image. See http://evernote.com/about/developer/api/evernote-api.htm, check out "Evernote Recognition Index XML Format" and functions to retrieve it. Problem is that they don't do traditional OCR ... their OCR algorithm may produce different words for single "word" on image. All they use it for is search, so this is fine for them, but not fine for using it as a recognition engine. (Although they give you weight for each word alternative, so maybe you can use that)


Also, Evernote apparently doesn't decide a particular image is equivalent to exactly one word - e.g., Evernote doesn't determine that a particular image is "clue" and is not "due". Rather, it will track both, and a search for either would return the same image. Hence, there's no way to get a full-text equivalent because Evernote isn't deciding what the full text actually is, only what it could be.


evernote pays a decent sum to the creator of the ocr-stuff OR paid a decent sum to put something working together. thus, i really doubt that they will let you get the extracted text (+ positioning on the image).

(could be a business model, to scan other peoples images and provide good ocr :))

so, the answer is: no.

Tags:

Ocr

Evernote