Hi Chris,
I was playing with it recently.
One of the big issues with tesseract is a tough process of the preparing
training set for multiple fonts and languages.
In addition, we also have to add an option for image preprocessing (skewing
+ filtering etc).
BR,
Oleg
On Wed, Nov 30, 2011 at 8:59 AM
Hey Guys,
FYI: http://code.google.com/p/tesseract-ocr/
I was pointed at this library by someone recently asking me if Tika
was interested in integrating with this library. It's ALv2 licensed, and
seems pretty interesting. I'm going to check it out, but just
wanted to give everyone a heads up.
[
https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159397#comment-13159397
]
Nick Burch commented on TIKA-795:
-
We are going to want this variable though, as it's needed
[
https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeremy Anderson updated TIKA-795:
-
Attachment: testWORD_embeded.docx
Patch_795_XSLF.patch
Patch to remove unused varia
[PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI -
XSLFSlide.getMasterSheet()
---
Key: TIKA-795
URL: https://issues.apache.org/jira/browse/TIKA-7