Re: Extract Text from a TIFF image

2016-07-18 Thread John Patrick
Simply install tesseract and re-run your testing again and it should instantly work as tika will detect tesseract is available. I've used tika and tesseract recently with success so I know 1.13 works. John On 18 July 2016 at 21:43, Gordon Schneider wrote: > Timothy > > > > That looks promising

RE: Extract Text from a TIFF image

2016-07-18 Thread Gordon Schneider
Timothy That looks promising. It will be ugly to work with but extracting text from a PDF can be no fun either. I will download the tesseract and see if I can get it working. I will let you know how it works. Thanks From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: July 18, 2016 2:

RE: Extract Text from a TIFF image

2016-07-18 Thread Allison, Timothy B.
You'll need to set up tesseract to run Optical Character Recognition. While we have an integration with OCR, it is not bundled within the app. See https://wiki.apache.org/tika/TikaOCR For kicks, I ran this through Tika+Tesseract; this is the output you get once you've set up Tesseract: SUPPLI

Extract Text from a TIFF image

2016-07-18 Thread Gordon Schneider
I have tried using the GUI for tika-app-1.13 but it shows nothing. I can see the metdata but that does not give me the information I need. I have attached the file. Maybe it is not possible to extract the text. If so what should I be looking for to tell me that it cannot extract the text. Than