Tika is a jar file.. Tesseract are native libraries installed into the operating system, that are picked up by Tika.
If your referring to Tesseract needs to be a jar file that can be installed on our Power Server. You'll need to go to the Tesseract project as see if they will implement those changes specifically for your projects. I very much doube they will as it's C/C++ code not Java. John On 21 July 2016 at 17:26, Gordon Schneider <schneid...@transampiping.com> wrote: > I downloaded the version you mentioned and I got it to work. The results > were the same as yours. > > > > But after review this unfortunately will not work for us as we need > something that is in a jar file that we can install on our Power Server. > The project that the OCR will eventually be part has this as a requirement. > > > > If you have any other suggestions, please let me know. > > > > Thanks for your time. It is very much appreciated. > > > > Gord > > > > > > *From:* Allison, Timothy B. [mailto:talli...@mitre.org] > *Sent:* July 19, 2016 9:58 AM > *To:* user@tika.apache.org > *Subject:* RE: Extract Text from a TIFF image > > > > You might want to experiment with different -psm values, we use 1 by > default. > > > > Also, which version of Tesseract? I think I got mine from ( > https://github.com/UB-Mannheim/tesseract/wiki), version: > > > > tesseract 3.05.00dev > > leptonica-1.73 > > libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : > libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0 > > > > > > > > *From:* Gordon Schneider [mailto:schneid...@transampiping.com > <schneid...@transampiping.com>] > *Sent:* Tuesday, July 19, 2016 11:22 AM > *To:* 'user@tika.apache.org' <user@tika.apache.org> > *Subject:* RE: Extract Text from a TIFF image > > > > I installed tesseract on my PC. I ran tesseract on its own using the > following command: > > > > tesseract.exe x:/java/PDFBox/Maxfield-1.tiff x:/java/PDFBox/Maxfield-1 > > > > The results are in the attached file. Not as clean as the results Timothy > got. I am closer to where I want to get to but obviously I am a number of > steps to my ideal solution. How to get the same results Timothy got? > > > > Thanks > > > > Gord > > > > > > *From:* Allison, Timothy B. [mailto:talli...@mitre.org > <talli...@mitre.org>] > *Sent:* July 18, 2016 2:25 PM > *To:* user@tika.apache.org > *Subject:* RE: Extract Text from a TIFF image > > > > You’ll need to set up tesseract to run Optical Character Recognition. > While we have an integration with OCR, it is not bundled within the app. > > > > See https://wiki.apache.org/tika/TikaOCR > > > > For kicks, I ran this through Tika+Tesseract; this is the output you get > once you’ve set up Tesseract: > > > > SUPPLIER: 3177 Invoice Date Description Amount Discount Net Amount > 015-28339 06/08/2015 21,318.54 0.00 21,318.54 C15-28837 06/04/2015 1,529.75 > 0.00 1,529.75 01528978 06/04/2015 1,238.18 0.00 1,238.18 015-28978-01 > 06/04/2015 1,182.85 0.00 1,182.85 015-28439 06/01/2015 1,113.86 0.00 > 1,113.86 C15-29707 06/11/2015 886.84 0.00 886.64 C15-28978-02 06/04/2015 > 526.91 0.00 526.91 01529385 06/09/2015 199.29 0.00 199.29 C15~28439~01 > 06/03/2015 157.34 0.00 157.34 C15-28670 06/03/2015 136.52 0.00 136.52 > C15—28314-01 06/03/2015 132.81 0.00 132.81 015-28576 06/02/2015 61.26 0.00 > 61.26 015-29413 06/11/2015 22.37 0.00 22.37 Cheque #: 83077 Cheque Date > 7/14/2015 28,506.32 0.00 28,506.32 SUPPLIER: 3177 Invoice Date > Description Amount Discount Net Amount C15-28339 06/08/2015 21,318.54 0.00 > 21,318.54 015-28837 06/04/2015 1,529.75 0.00 1,529.75 015-28978 06/04/2015 > 1,238.18 0.00 1,238.18 015-28978-01 06I04/2015 1 ,18285 0.00 1,182.85 > C15-28439 06/01/2015 1,113.86 0.00 1,113.86 015-29707 06l11/2015 886.64 > 0.00 886.64 C15-28978~02 06/04/2015 526.91 0.00 526.91 015-29385 06/09/2015 > 199.29 0.00 199.29 C15-28439-01 06/03/2015 157.34 0.00 157.34 015-28670 > 06/03/2015 136.52 0.00 136.52 015-28314—01 06/03/2015 132.81 0.00 132.81 > C15-28576 06/02/2015 61.26 0.00 61.26 015-29413 06/11/2015 22.37 0.00 22.37 > Cheque #1 83077 Check Daie: 7/14/2015 28,506.32 0.00 28,506.32 07142015 > MMDDYYYY TWENTY-EIGHT THOUSAND FIVE HUNDRED SIX CAD AND 32/ 100 $ > "******28,506.32 Trans Am Piping Canada > > > > *From:* Gordon Schneider [mailto:schneid...@transampiping.com > <schneid...@transampiping.com>] > *Sent:* Monday, July 18, 2016 4:05 PM > *To:* 'user@tika.apache.org' <user@tika.apache.org> > *Subject:* Extract Text from a TIFF image > > > > I have tried using the GUI for tika-app-1.13 but it shows nothing. I can > see the metdata but that does not give me the information I need. I have > attached the file. > > > > Maybe it is not possible to extract the text. If so what should I be > looking for to tell me that it cannot extract the text. > > > > Thanks > > > > > > Gordon Schneider > > 403-236-0601 > > Trans Am Piping Products Ltd. > > >