subject:"Tess4j API for TIKA OCR parser"

Re: Tess4j API for TIKA OCR parser

2017-03-08 Thread Thejan Wijesinghe

gt; Sent: Tuesday, March 7, 2017 10:38 AM > To: Thejan Wijesinghe > Cc: dev@tika.apache.org > Subject: Re: Tess4j API for TIKA OCR parser > > Thanks Nick for the reply. > > Thejan, > > I am glad to know your progress. Rewriting the TesseractOCRParser would be > the ulti

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thamme Gowda

apache.org] Sent: Tuesday, March 7, 2017 10:38 AM To: Thejan Wijesinghe Cc: dev@tika.apache.org Subject: Re: Tess4j API for TIKA OCR parser Thanks Nick for the reply. Thejan, I am glad to know your progress. Rewriting the TesseractOCRParser would be the ultimate goal if using Tess4j proves to be b

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Allison, Timothy B.

Y and why not give the new tika-eval module a trial to evaluate the differences in output? :) -Original Message- From: Thamme Gowda [mailto:thammego...@apache.org] Sent: Tuesday, March 7, 2017 10:38 AM To: Thejan Wijesinghe Cc: dev@tika.apache.org Subject: Re: Tess4j API for TIKA OCR

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Allison, Timothy B.

+1 Same experience, of same vintage. :) -Original Message- From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com] Sent: Tuesday, March 7, 2017 10:34 AM To: dev@tika.apache.org Subject: Re: Tess4j API for TIKA OCR parser Hi Thejan, Before the first version of TesseractOcrParser was

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Luís Filipe Nassif

Hi Thejan, Before the first version of TesseractOcrParser was commited I tried to use Tess4j, that was 4 years ago. Unfortunatelly that time I run into some problems like permanent hangs with tesseract/Tess4j and, even worse, Jvm crashes because of bugs into native code (pointers to crazy adresses

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thamme Gowda

Thanks Nick for the reply. Thejan, I am glad to know your progress. Rewriting the TesseractOCRParser would be the ultimate goal if using Tess4j proves to be better than the way it is done currently. But, for now, please consider these: + Rename your class to *Tess4jOCRParser*. It is a new parser

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe

Hi Nick, I thought the same thing. I will try to keep the public method signatures unchanged and will send updates on my progress. On Tue, Mar 7, 2017 at 5:48 PM, Nick Burch wrote: > On Tue, 7 Mar 2017, Thejan Wijesinghe wrote: > >> I have already use the Tess4j API to rewrite the TesseractOCRP

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Nick Burch

On Tue, 7 Mar 2017, Thejan Wijesinghe wrote: I have already use the Tess4j API to rewrite the TesseractOCRParser class, Although It successfully extracts content from most of the file types, it fails some particular unit tests in the TesseractOCRParserTest class. I can solve that. However, I want

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe

Hi Thamme, I did minimal changes to the TesseractOCRParser class. I basically changed the doOCR() private method. But the existing unit tests get failed even though the content and metadata get extracted. Could you provide me with any guidance on resolving these errors by running the test cases. I

Re: Tess4j API for TIKA OCR parser

2017-03-06 Thread Thejan Wijesinghe

Thamme, I have already use the Tess4j API to rewrite the TesseractOCRParser class, Although It successfully extracts content from most of the file types, it fails some particular unit tests in the TesseractOCRParserTest class. I can solve that. However, I want to know whether I can rewrite the enti

Re: Tess4j API for TIKA OCR parser

2017-03-05 Thread Thamme Gowda

Thejan, Welcome to the world of mysteries. I am unable to explain why you are facing it since I am unable to reproduce it. Try out few other images, may be the image you have chosen is corrupt and maybe there is an exception thrown and silently swallowed in code. I suggest you do this: Please

Tess4j API for TIKA OCR parser

2017-03-04 Thread Thejan Wijesinghe

Hi Thamme, Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both installed in my system using apt-get. Since, I wasn't sure whether this is a problem with the APT software packages, I built both ImageMagick and Tesseract from sources. I also double checked the availability of Tessera

Re: Tess4j API for TIKA OCR parser

RE: Tess4j API for TIKA OCR parser

RE: Tess4j API for TIKA OCR parser

RE: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Re: Tess4j API for TIKA OCR parser

Tess4j API for TIKA OCR parser

12 matches

Site Navigation

Mail list logo

Footer information