Have you looked at imagemagick and related scripts for pre-processing the
images?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Jan 21, 2015 at 1:30 AM, newbie <spens.mallang...@gmail.com> wrote:

> I found that vip1200.jpg works at  scale Width(8654px) and
> height(5748px), but most of the time I either get an "Invalid mem access"
> or out of mem(heap) error before I am able to rescale to the optimal scale.
> I need to come up with some other generic way to upscale and ocr images.
> Any ideas are appreciated.
>
> On Tuesday, January 20, 2015 at 11:38:54 AM UTC-5, newbie wrote:
>>
>> Thanks folks to all who have taken the time to respond.
>>
>> This is what I am trying to do now, I upscale the image then feed it to
>> the ocr and then run it against a dictionary of words I have, if it does
>> not match, I iteratively upscale and feed it to the ocr. I cannot upscale
>> it very big as there are 3 problems.
>>
>> 1. The text I am trying to seek gets very blurred and ocr will fail
>> 2. I run out of memory upscaling.(I have the heap size increased to the
>> max).
>> 3. This process is time consuming
>>
>>  My upscale multiple(by how many pixels i upscale  the entire image) is
>> also set based on the max dimension of the original image(i,e if vertical
>> dimension is more then vertical pixels become my max dimension, likewise
>> with horizontal, eg height is 29 and width 67, max dimension=67).
>> if (maxDimension <100)
>>     scaledMultiple=10;
>>     else if (maxDimension >100 && maxDimension<1000)
>>     scaledMultiple=50;
>>     else  if (maxDimension > 1000)
>>     scaledMultiple=100;
>>
>> This works for most of the images I have currently, but fails for a few.
>> I will attach the failing ones(needs to read VIP1200 in VIP1200R.png and
>> VIP1200R_cropped).  Appreciate it if any of you could tell me, how I can
>> get this to work. Also if there is another way to go about this, as my
>> images are varying in size drastically(ofcourse I ahve put across the
>> suggestion of cropping  the model number within a text box, as Allistair
>> has suggested and they are mulling over it(so I guess the idea is not well
>> received)).
>>
>> I do maintain the aspect ratio of the original image when I upscale....so
>> the ovalizing the text is not done, may be should try that ? Also I am now
>> converting jpg to png files, do you know which format works the best ?
>> Thanks
>>
>> Appreciate it.
>>
>>
>>
>> On Sunday, January 18, 2015 at 1:59:28 PM UTC-5, Flash Thunder wrote:
>>>
>>> Oh, sorry for double post... wrong key. I have to say, that for example
>>> for captcha recognation, I do resize images to 200% or even 300%... same
>>> image not resized does not give any results. Not sure why. Probably,
>>> because font changes to more ... "oval".
>>>
>>> 2015-01-18 19:57 GMT+01:00 Marek FlashT Rucinski <przys...@gmail.com>:
>>>
>>>> Don't use DPI metric, as it does not really count for Tesseract. The
>>>> best results (that is from my experience) are obtained when font size is
>>>> 70-90px (so it is a bit large for normal usage).
>>>>
>>>> 2015-01-15 1:58 GMT+01:00 Quan Nguyen <nguy...@gmail.com>:
>>>>
>>>>> You can use the command combine_tessdata
>>>>> <http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html>
>>>>> to unpack a traineddata file to examine its components.
>>>>>
>>>>> The eng.traineddata bundled with Tess4J is of 3.01 version. You may
>>>>> want to try 3.02 and see if it can produce better results for you (check 
>>>>> in
>>>>> https://code.google.com/p/tesseract-ocr/downloads/list).
>>>>>
>>>>> On Monday, January 12, 2015 at 10:18:18 AM UTC-6, newbie wrote:
>>>>>>
>>>>>> Does anyone know that if  tessdata/eng.traineddata(the final
>>>>>> crunched data) in tess4j comes with all the below files included ?
>>>>>>
>>>>>>
>>>>>>    - tessdata/eng.config
>>>>>>    - tessdata/eng.unicharset
>>>>>>    - tessdata/eng.unicharambigs
>>>>>>    - tessdata/eng.inttemp
>>>>>>    - tessdata/eng.pffmtable
>>>>>>    - tessdata/eng.normproto
>>>>>>    - tessdata/eng.punc-dawg
>>>>>>    - tessdata/eng.word-dawg
>>>>>>    - tessdata/eng.number-dawg
>>>>>>    - tessdata/eng.freq-dawg
>>>>>>
>>>>>> Also is this enough to identify any of the normal fonts(images
>>>>>> attached) ? Appreciate your help.
>>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033%
>>>>> 40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/991f0517-29d9-440b-97e4-8e2616c30033%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/93c8ef96-cb73-41c4-b9e7-747a7b4c661f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/93c8ef96-cb73-41c4-b9e7-747a7b4c661f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcNPCb9xmqwatnWaYyODqMcX_EcKO_y4A6co4yMHObyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to