Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
In opencv binarisation is 1 line of code, it's called threshold and you can choose various types. If I remember tomorrow I'll post some android demo code. Sent from my iPhone > On 22 Jan 2015, at 21:08, newbie wrote: > > Ok my question should have been phrased better, I aplogize.

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
At what point will you use Google to answer these simple questions? OpenCV has already been mentioned many times. Sent from my iPhone > On 22 Jan 2015, at 18:39, newbie wrote: > > Any idea of what free source is available for bininrizing in java ? > > Thanks > >> On Thursday, January 22, 201

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread newbie
Any idea of what free source is available for bininrizing in java ? Thanks On Thursday, January 22, 2015 at 12:32:53 PM UTC-5, rkomar wrote: > > On Tue, 20 Jan 2015, newbie wrote: > > > I found that vip1200.jpg works at scale Width(8654px) and > > height(5748px), but most of the time I either

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Robert Komar
On Tue, 20 Jan 2015, newbie wrote: I found that vip1200.jpg works at scale Width(8654px) and height(5748px), but most of the time I either get an "Invalid mem access" or out of mem(heap) error before I am able to rescale to the optimal scale.I need to come up with some other generic way to upsc

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Art W Rhyno
> Now with Olena, does it provide an api instead of a tool to preprocess(mark text regions) the image programatically ? Hi, Look at source for the "content_in_hdoc_hdlac" program in the distribution if it looks like Olena would work for you, it shows how to use Olena programmatically . Good lu

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair
Not exactly an answer, but someone else with the same issue has gotten most of the way there. http://stackoverflow.com/questions/24385714/detect-text-region-in-image-using-opencv On 22 January 2015 at 15:35, newbie wrote: > ShreeDevi, > ImageMagick, seems like a manual tool, but I think the pro

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread newbie
ShreeDevi, ImageMagick, seems like a manual tool, but I think the problem I need to solve is - a generic way of image preprocessing for all images. Art, I have been looking for a text region segregation tool, had only one from matworks that looked promising. Now with Olena, does it provide a

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-20 Thread ShreeDevi Kumar
Have you looked at imagemagick and related scripts for pre-processing the images? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jan 21, 2015 at 1:30 AM, newbie wrote: > I found that vip1200.jpg works at scale

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-20 Thread newbie
I found that vip1200.jpg works at scale Width(8654px) and height(5748px), but most of the time I either get an "Invalid mem access" or out of mem(heap) error before I am able to rescale to the optimal scale. I need to come up with some other generic way to upscale and ocr images. Any ideas are

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-18 Thread Marek FlashT Rucinski
Oh, sorry for double post... wrong key. I have to say, that for example for captcha recognation, I do resize images to 200% or even 300%... same image not resized does not give any results. Not sure why. Probably, because font changes to more ... "oval". 2015-01-18 19:57 GMT+01:00 Marek FlashT Ruc

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-18 Thread Marek FlashT Rucinski
Don't use DPI metric, as it does not really count for Tesseract. The best results (that is from my experience) are obtained when font size is 70-90px (so it is a bit large for normal usage). 2015-01-15 1:58 GMT+01:00 Quan Nguyen : > You can use the command combine_tessdata >

[tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-14 Thread Quan Nguyen
You can use the command combine_tessdata to unpack a traineddata file to examine its components. The eng.traineddata bundled with Tess4J is of 3.01 version. You may want to try 3.02 and see if it can produce be

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-14 Thread Robert Komar
On Wed, 14 Jan 2015, newbie wrote: Flash Thunder, I think I went ahead of myself in the email below. The upscaled image has the same dpi as the original image( 96dpi). I ahve upscaled pixels for which the ocr works without doing step 2 and 3(by trail and error). But I dont a

[tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-14 Thread newbie
Flash Thunder, I think I went ahead of myself in the email below. The upscaled image has the same dpi as the original image( 96dpi). I ahve upscaled pixels for which the ocr works without doing step 2 and 3(by trail and error). But I dont ahve a generic formula to upscale

[tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-13 Thread newbie
Thanks, I have it working by doing simple things. 1. I need to get the resolution upscaled to 300 dpi(including sharpening of the image) and it did the trick. On Monday, January 12, 2015 at 5:39:38 PM UTC-5, Flash Thunder wrote: > > It should identify those images without any problems, you just

[tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-12 Thread Flash Thunder
It should identify those images without any problems, you just need to prepare image right. 3 steps for you: 1. Tesseract likes when letters are about 70-100px height, so you need to resize your images. 2. Invert colors - as I noticed, it doesn't like it this way at all - letters must have to