I found that vip1200.jpg works at scale Width(8654px) and height(5748px), but most of the time I either get an "Invalid mem access" or out of mem(heap) error before I am able to rescale to the optimal scale. I need to come up with some other generic way to upscale and ocr images. Any ideas are appreciated.
On Tuesday, January 20, 2015 at 11:38:54 AM UTC-5, newbie wrote: > > Thanks folks to all who have taken the time to respond. > > This is what I am trying to do now, I upscale the image then feed it to > the ocr and then run it against a dictionary of words I have, if it does > not match, I iteratively upscale and feed it to the ocr. I cannot upscale > it very big as there are 3 problems. > > 1. The text I am trying to seek gets very blurred and ocr will fail > 2. I run out of memory upscaling.(I have the heap size increased to the > max). > 3. This process is time consuming > > My upscale multiple(by how many pixels i upscale the entire image) is > also set based on the max dimension of the original image(i,e if vertical > dimension is more then vertical pixels become my max dimension, likewise > with horizontal, eg height is 29 and width 67, max dimension=67). > if (maxDimension <100) > scaledMultiple=10; > else if (maxDimension >100 && maxDimension<1000) > scaledMultiple=50; > else if (maxDimension > 1000) > scaledMultiple=100; > > This works for most of the images I have currently, but fails for a few. I > will attach the failing ones(needs to read VIP1200 in VIP1200R.png and > VIP1200R_cropped). Appreciate it if any of you could tell me, how I can > get this to work. Also if there is another way to go about this, as my > images are varying in size drastically(ofcourse I ahve put across the > suggestion of cropping the model number within a text box, as Allistair > has suggested and they are mulling over it(so I guess the idea is not well > received)). > > I do maintain the aspect ratio of the original image when I upscale....so > the ovalizing the text is not done, may be should try that ? Also I am now > converting jpg to png files, do you know which format works the best ? > Thanks > > Appreciate it. > > > > On Sunday, January 18, 2015 at 1:59:28 PM UTC-5, Flash Thunder wrote: >> >> Oh, sorry for double post... wrong key. I have to say, that for example >> for captcha recognation, I do resize images to 200% or even 300%... same >> image not resized does not give any results. Not sure why. Probably, >> because font changes to more ... "oval". >> >> 2015-01-18 19:57 GMT+01:00 Marek FlashT Rucinski <przys...@gmail.com>: >> >>> Don't use DPI metric, as it does not really count for Tesseract. The >>> best results (that is from my experience) are obtained when font size is >>> 70-90px (so it is a bit large for normal usage). >>> >>> 2015-01-15 1:58 GMT+01:00 Quan Nguyen <nguy...@gmail.com>: >>> >>>> You can use the command combine_tessdata >>>> <http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html> >>>> >>>> to unpack a traineddata file to examine its components. >>>> >>>> The eng.traineddata bundled with Tess4J is of 3.01 version. You may >>>> want to try 3.02 and see if it can produce better results for you (check >>>> in >>>> https://code.google.com/p/tesseract-ocr/downloads/list). >>>> >>>> On Monday, January 12, 2015 at 10:18:18 AM UTC-6, newbie wrote: >>>>> >>>>> Does anyone know that if tessdata/eng.traineddata(the final crunched >>>>> data) in tess4j comes with all the below files included ? >>>>> >>>>> >>>>> - tessdata/eng.config >>>>> - tessdata/eng.unicharset >>>>> - tessdata/eng.unicharambigs >>>>> - tessdata/eng.inttemp >>>>> - tessdata/eng.pffmtable >>>>> - tessdata/eng.normproto >>>>> - tessdata/eng.punc-dawg >>>>> - tessdata/eng.word-dawg >>>>> - tessdata/eng.number-dawg >>>>> - tessdata/eng.freq-dawg >>>>> >>>>> Also is this enough to identify any of the normal fonts(images >>>>> attached) ? Appreciate your help. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/93c8ef96-cb73-41c4-b9e7-747a7b4c661f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.