Simply install tesseract and re-run your testing again and it should
instantly work as tika will detect tesseract is available. I've used tika
and tesseract recently with success so I know 1.13 works.
John
On 18 July 2016 at 21:43, Gordon Schneider
wrote:
> Timothy
>
>
>
> That looks promising
Timothy
That looks promising. It will be ugly to work with but extracting text from a
PDF can be no fun either.
I will download the tesseract and see if I can get it working. I will let you
know how it works.
Thanks
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: July 18, 2016 2:
You'll need to set up tesseract to run Optical Character Recognition. While we
have an integration with OCR, it is not bundled within the app.
See https://wiki.apache.org/tika/TikaOCR
For kicks, I ran this through Tika+Tesseract; this is the output you get once
you've set up Tesseract:
SUPPLI
I have tried using the GUI for tika-app-1.13 but it shows nothing. I can see
the metdata but that does not give me the information I need. I have attached
the file.
Maybe it is not possible to extract the text. If so what should I be looking
for to tell me that it cannot extract the text.
Than