Re: Tesseract OCR engine

2011-11-29 Thread Oleg Tikhonov
Hi Chris, I was playing with it recently. One of the big issues with tesseract is a tough process of the preparing training set for multiple fonts and languages. In addition, we also have to add an option for image preprocessing (skewing + filtering etc). BR, Oleg On Wed, Nov 30, 2011 at 8:59 AM

Tesseract OCR engine

2011-11-29 Thread Mattmann, Chris A (388J)
Hey Guys, FYI: http://code.google.com/p/tesseract-ocr/ I was pointed at this library by someone recently asking me if Tika was interested in integrating with this library. It's ALv2 licensed, and seems pretty interesting. I'm going to check it out, but just wanted to give everyone a heads up.

[jira] [Commented] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

2011-11-29 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159397#comment-13159397 ] Nick Burch commented on TIKA-795: - We are going to want this variable though, as it's needed

[jira] [Updated] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

2011-11-29 Thread Jeremy Anderson (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Anderson updated TIKA-795: - Attachment: testWORD_embeded.docx Patch_795_XSLF.patch Patch to remove unused varia

[jira] [Created] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

2011-11-29 Thread Jeremy Anderson (Created) (JIRA)
[PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet() --- Key: TIKA-795 URL: https://issues.apache.org/jira/browse/TIKA-7