Have you looked at imagemagick and related scripts for pre-processing the images?
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jan 21, 2015 at 1:30 AM, newbie <spens.mallang...@gmail.com> wrote: > I found that vip1200.jpg works at scale Width(8654px) and > height(5748px), but most of the time I either get an "Invalid mem access" > or out of mem(heap) error before I am able to rescale to the optimal scale. > I need to come up with some other generic way to upscale and ocr images. > Any ideas are appreciated. > > On Tuesday, January 20, 2015 at 11:38:54 AM UTC-5, newbie wrote: >> >> Thanks folks to all who have taken the time to respond. >> >> This is what I am trying to do now, I upscale the image then feed it to >> the ocr and then run it against a dictionary of words I have, if it does >> not match, I iteratively upscale and feed it to the ocr. I cannot upscale >> it very big as there are 3 problems. >> >> 1. The text I am trying to seek gets very blurred and ocr will fail >> 2. I run out of memory upscaling.(I have the heap size increased to the >> max). >> 3. This process is time consuming >> >> My upscale multiple(by how many pixels i upscale the entire image) is >> also set based on the max dimension of the original image(i,e if vertical >> dimension is more then vertical pixels become my max dimension, likewise >> with horizontal, eg height is 29 and width 67, max dimension=67). >> if (maxDimension <100) >> scaledMultiple=10; >> else if (maxDimension >100 && maxDimension<1000) >> scaledMultiple=50; >> else if (maxDimension > 1000) >> scaledMultiple=100; >> >> This works for most of the images I have currently, but fails for a few. >> I will attach the failing ones(needs to read VIP1200 in VIP1200R.png and >> VIP1200R_cropped). Appreciate it if any of you could tell me, how I can >> get this to work. Also if there is another way to go about this, as my >> images are varying in size drastically(ofcourse I ahve put across the >> suggestion of cropping the model number within a text box, as Allistair >> has suggested and they are mulling over it(so I guess the idea is not well >> received)). >> >> I do maintain the aspect ratio of the original image when I upscale....so >> the ovalizing the text is not done, may be should try that ? Also I am now >> converting jpg to png files, do you know which format works the best ? >> Thanks >> >> Appreciate it. >> >> >> >> On Sunday, January 18, 2015 at 1:59:28 PM UTC-5, Flash Thunder wrote: >>> >>> Oh, sorry for double post... wrong key. I have to say, that for example >>> for captcha recognation, I do resize images to 200% or even 300%... same >>> image not resized does not give any results. Not sure why. Probably, >>> because font changes to more ... "oval". >>> >>> 2015-01-18 19:57 GMT+01:00 Marek FlashT Rucinski <przys...@gmail.com>: >>> >>>> Don't use DPI metric, as it does not really count for Tesseract. The >>>> best results (that is from my experience) are obtained when font size is >>>> 70-90px (so it is a bit large for normal usage). >>>> >>>> 2015-01-15 1:58 GMT+01:00 Quan Nguyen <nguy...@gmail.com>: >>>> >>>>> You can use the command combine_tessdata >>>>> <http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html> >>>>> to unpack a traineddata file to examine its components. >>>>> >>>>> The eng.traineddata bundled with Tess4J is of 3.01 version. You may >>>>> want to try 3.02 and see if it can produce better results for you (check >>>>> in >>>>> https://code.google.com/p/tesseract-ocr/downloads/list). >>>>> >>>>> On Monday, January 12, 2015 at 10:18:18 AM UTC-6, newbie wrote: >>>>>> >>>>>> Does anyone know that if tessdata/eng.traineddata(the final >>>>>> crunched data) in tess4j comes with all the below files included ? >>>>>> >>>>>> >>>>>> - tessdata/eng.config >>>>>> - tessdata/eng.unicharset >>>>>> - tessdata/eng.unicharambigs >>>>>> - tessdata/eng.inttemp >>>>>> - tessdata/eng.pffmtable >>>>>> - tessdata/eng.normproto >>>>>> - tessdata/eng.punc-dawg >>>>>> - tessdata/eng.word-dawg >>>>>> - tessdata/eng.number-dawg >>>>>> - tessdata/eng.freq-dawg >>>>>> >>>>>> Also is this enough to identify any of the normal fonts(images >>>>>> attached) ? Appreciate your help. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/93c8ef96-cb73-41c4-b9e7-747a7b4c661f%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/93c8ef96-cb73-41c4-b9e7-747a7b4c661f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcNPCb9xmqwatnWaYyODqMcX_EcKO_y4A6co4yMHObyw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.