You can build from source if you have an interest (and the bandwidth, time and 
disk space) or pull a nightly build if you don’t want to wait for 1.11, for 
example: 
https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/849/org.apache.tika$tika-app/

Thank you, Christian!

Best,

        Tim

From: Brian Young [mailto:bwyoung.s...@gmail.com]
Sent: Wednesday, September 09, 2015 4:09 PM
To: user@tika.apache.org
Subject: Re: tesseract issue

Ah that is very good- thank you.  Looks like it will be in 1.11.



On Wed, Sep 9, 2015 at 4:00 PM, Christian Wolfe 
<taida...@gmail.com<mailto:taida...@gmail.com>> wrote:
Brian,

I submitted a patch for this bug that was accepted by the team - 
https://github.com/apache/tika/pull/56

I do'nt think it has made it to any release version.

On Wed, Sep 9, 2015 at 3:55 PM, Brian Young 
<bwyoung.s...@gmail.com<mailto:bwyoung.s...@gmail.com>> wrote:
Hello,

On OS X at least, tesseract and tessdata may not be under a common root.  e.g.:


/opt/local/share/tessdata

/opt/local/bin/tesseract



Unfortunately it looks like TesseractOCRParser does not accommodate for this 
since there is only one configuration value that is used for finding the binary 
as well as setting the TESSDATA _PREFIX environment var.



Now, TESSDATA_PREFIX does not get set if I do not pass in the path on the 
config object.  However, even though tesseract is in my path, it isn't found 
when the ProcessBuilder executes unless I've given it the full path... which of 
course sets the TESSDATA_PREFIX to the wrong thing.



It seems like maybe it would be best to handle these as two separate 
configuration values?  But short of that and a new version of Tika, does anyone 
have any other advice?



Thank you

Brian










Reply via email to