You can build from source if you have an interest (and the bandwidth, time and disk space) or pull a nightly build if you don’t want to wait for 1.11, for example: https://builds.apache.org/view/Tika/job/tika-trunk-jdk1.7/849/org.apache.tika$tika-app/
Thank you, Christian! Best, Tim From: Brian Young [mailto:bwyoung.s...@gmail.com] Sent: Wednesday, September 09, 2015 4:09 PM To: user@tika.apache.org Subject: Re: tesseract issue Ah that is very good- thank you. Looks like it will be in 1.11. On Wed, Sep 9, 2015 at 4:00 PM, Christian Wolfe <taida...@gmail.com<mailto:taida...@gmail.com>> wrote: Brian, I submitted a patch for this bug that was accepted by the team - https://github.com/apache/tika/pull/56 I do'nt think it has made it to any release version. On Wed, Sep 9, 2015 at 3:55 PM, Brian Young <bwyoung.s...@gmail.com<mailto:bwyoung.s...@gmail.com>> wrote: Hello, On OS X at least, tesseract and tessdata may not be under a common root. e.g.: /opt/local/share/tessdata /opt/local/bin/tesseract Unfortunately it looks like TesseractOCRParser does not accommodate for this since there is only one configuration value that is used for finding the binary as well as setting the TESSDATA _PREFIX environment var. Now, TESSDATA_PREFIX does not get set if I do not pass in the path on the config object. However, even though tesseract is in my path, it isn't found when the ProcessBuilder executes unless I've given it the full path... which of course sets the TESSDATA_PREFIX to the wrong thing. It seems like maybe it would be best to handle these as two separate configuration values? But short of that and a new version of Tika, does anyone have any other advice? Thank you Brian