gt; Sent: Tuesday, March 7, 2017 10:38 AM
> To: Thejan Wijesinghe
> Cc: dev@tika.apache.org
> Subject: Re: Tess4j API for TIKA OCR parser
>
> Thanks Nick for the reply.
>
> Thejan,
>
> I am glad to know your progress. Rewriting the TesseractOCRParser would be
> the ulti
apache.org]
Sent: Tuesday, March 7, 2017 10:38 AM
To: Thejan Wijesinghe
Cc: dev@tika.apache.org
Subject: Re: Tess4j API for TIKA OCR parser
Thanks Nick for the reply.
Thejan,
I am glad to know your progress. Rewriting the TesseractOCRParser would be
the ultimate goal if using Tess4j proves to be b
Y and why not give the new tika-eval module a trial to evaluate the differences
in output? :)
-Original Message-
From: Thamme Gowda [mailto:thammego...@apache.org]
Sent: Tuesday, March 7, 2017 10:38 AM
To: Thejan Wijesinghe
Cc: dev@tika.apache.org
Subject: Re: Tess4j API for TIKA OCR
+1
Same experience, of same vintage. :)
-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Tuesday, March 7, 2017 10:34 AM
To: dev@tika.apache.org
Subject: Re: Tess4j API for TIKA OCR parser
Hi Thejan,
Before the first version of TesseractOcrParser was
Hi Thejan,
Before the first version of TesseractOcrParser was commited I tried to use
Tess4j, that was 4 years ago. Unfortunatelly that time I run into some
problems like permanent hangs with tesseract/Tess4j and, even worse, Jvm
crashes because of bugs into native code (pointers to crazy adresses
Thanks Nick for the reply.
Thejan,
I am glad to know your progress. Rewriting the TesseractOCRParser would be
the ultimate goal if using Tess4j proves to be better than the way it is
done currently.
But, for now, please consider these:
+ Rename your class to *Tess4jOCRParser*. It is a new parser
Hi Nick,
I thought the same thing. I will try to keep the public method signatures
unchanged and will send updates on my progress.
On Tue, Mar 7, 2017 at 5:48 PM, Nick Burch wrote:
> On Tue, 7 Mar 2017, Thejan Wijesinghe wrote:
>
>> I have already use the Tess4j API to rewrite the TesseractOCRP
On Tue, 7 Mar 2017, Thejan Wijesinghe wrote:
I have already use the Tess4j API to rewrite the TesseractOCRParser class,
Although It successfully extracts content from most of the file types, it
fails some particular unit tests in the TesseractOCRParserTest class. I can
solve that. However, I want
Hi Thamme,
I did minimal changes to the TesseractOCRParser class. I basically changed
the doOCR() private method. But the existing unit tests get failed even
though the content and metadata get extracted. Could you provide me with
any guidance on resolving these errors by running the test cases. I
Thamme,
I have already use the Tess4j API to rewrite the TesseractOCRParser class,
Although It successfully extracts content from most of the file types, it
fails some particular unit tests in the TesseractOCRParserTest class. I can
solve that. However, I want to know whether I can rewrite the enti
Thejan,
Welcome to the world of mysteries. I am unable to explain why you are
facing it since I am unable to reproduce it.
Try out few other images, may be the image you have chosen is corrupt and
maybe there is an exception thrown and silently swallowed in code.
I suggest you do this:
Please
Hi Thamme,
Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both
installed in my system using apt-get. Since, I wasn't sure whether this is
a problem with the APT software packages, I built both ImageMagick and
Tesseract from sources.
I also double checked the availability of Tessera
12 matches
Mail list logo