Re: Improving Tika OCR

2017-04-21 Thread Thamme Gowda
Thanks, Kranthi. Keep us informed about how it goes. Cheers, TG On Thu, Apr 20, 2017 at 1:01 PM, Kranthi Kiran G V < kkran...@student.nitw.ac.in> wrote: > Hello Thamme, > > Agreed. Looking at the paper[1], it seems to me that tesseract and VGG > models can co-exist > in Tika to serve all kinds

Re: Improving Tika OCR

2017-04-20 Thread Kranthi Kiran G V
Hello Thamme, Agreed. Looking at the paper[1], it seems to me that tesseract and VGG models can co-exist in Tika to serve all kinds of input images. I am able to run one of the models Deep Features for Text Spotting[2] by disabling the GPU. It however doesn't generate any text, but generates

Re: Improving Tika OCR

2017-04-19 Thread Thamme Gowda
Hi Kranthi, Thanks for updating us. I believe in the long run both of these two models may co-exist (tesseract for flat-bench scanner images with perfect lighting conditions, VGG models for natural images taken by cellphone/digital cameras with weird orientations and lighting conditions). I

Re: Improving Tika OCR

2017-04-19 Thread Kranthi Kiran G V
Hello community, I have successfully tested Tesseract 4.0 on various images of different sizes, orientation and lightening conditions. I would, in the next few days, publish the results on a blog for you to have a look at. Although I'm able to reliably measure the clock time, accuracy, etc, I am

Re: Improving Tika OCR

2017-04-17 Thread Kranthi Kiran G V
Hello Luis, Yes, tesseract 4.0 is not yet a stable release. VGG group's model has a 3-clause BSD license. I see it as a long term effort which would help the Tika's community experience near state of art OCR. This is an investigation into it to see if we can try out this direction. Thanks for

Re: Improving Tika OCR

2017-04-17 Thread Luís Filipe Nassif
Hi Kranthi, That is an interesting comparison! But I think Tesseract 4.0 is still alpha? And do you know the VGG software license? Best, Luis Em 17 de abr de 2017 8:46 AM, "Kranthi Kiran G V" < kkran...@student.nitw.ac.in> escreveu: Hello Tim Allison, I am currently working on improving

Re: Improving Tika OCR

2017-04-17 Thread Thamme Gowda
Thanks, Kranthi, for volunteering to do this evaluation :-) Best, Thamme -- Thamme Gowda TG | @thammegowda ~Sent via somebody's IMAP server On Apr 17, 2017 4:46 AM, "Kranthi Kiran G V" wrote: Hello Tim Allison, I am currently working on improving Tika's OCR

Improving Tika OCR

2017-04-17 Thread Kranthi Kiran G V
Hello Tim Allison, I am currently working on improving Tika's OCR capabilities. After suggestion from Thamme Gowda (@thammegowda ), I started to work on comparison of Tesseract 4.0's neural network