Thanks. I tried scaling the image horizontally trying different widths and 
heights and the best Tesseract could do for 09:43 AM was  

*ocr string = @9243 RH*


I'll check out the threads on LCD/clock type reading. Thanks for the 
pointer. 


-amit

On Thursday, August 20, 2015 at 8:09:58 AM UTC-4, Allistair C wrote:
>
> So another thing you could try ... I notice that everything is 
> horizontally compressed. You could try scaling the image horizontally only 
> to stretch things out (like I attach). 
>
> This would then make the problem similar to those looking to read e.g. 
> digital clock text - there are a variety of threads on this group about 
> LCD/clock type reading that may then reveal further things you could do 
> from that point.
>
>
>
> On 20 August 2015 at 13:03, Allistair <alli...@gmail.com <javascript:>> 
> wrote:
>
>> The font does not look like that - look the shape of the 0 which has a 
>> strikethrough in your image but not in Lucinda of the M shape. I am not 
>> sure font training will do a lot here, I think it's more the quality of the 
>> edges in your image due to the dot matrix printing or however it's printed 
>> producing uncertain edges. 
>>
>> Perhaps others can chip in.
>>
>> On 20 August 2015 at 10:31, Amit Rao <rao....@gmail.com <javascript:>> 
>> wrote:
>>
>>> Thanks, Allistair. I was guessing that this font was similar to Lucida 
>>> Console. e.g.
>>>
>>>
>>> https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A
>>>
>>> However, I don't know for certain what font this is and I don't know of 
>>> a tool that will help me know for sure which font the image uses. The only 
>>> text I am really interested in is "HH:MM AM/PM" but if I crop the image to 
>>> include only the time Tesseract is still not able to read it similar to 
>>> what you reported.. I cropped the image to include 09:43 AM and it reads it 
>>> as  *@9243 Rh*
>>>
>>> If this is a font that Tesseract does not recognize would it help 
>>> augmenting the training data set with data from images with this format and 
>>> font? 
>>>
>>> Thanks,
>>> amit
>>>
>>>
>>>
>>> On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote:
>>>>
>>>> Which Lucinda font do you think this is? All Lucinda fonts I see in a 
>>>> Google Image search are nothing like this.
>>>>
>>>> You're right, this does not OCR well. In fact, if you just crop out a 
>>>> part of it to remove other noise, say, 09:43 AM, even with lots of margin 
>>>> Tesseract isn't even finding anything it thinks looks like text in normal 
>>>> page segmentation.
>>>>
>>>> The best I got (for the cropped out time) was:
>>>>
>>>> 39:43 HH
>>>>
>>>> So 28% incorrect.
>>>>
>>>> The definition of the 'M' is quite eroded already which is not great.
>>>>
>>>>
>>>>
>>>> On 20 August 2015 at 08:29, Amit Rao <rao....@gmail.com> wrote:
>>>>
>>>>> HI folks, 
>>>>>
>>>>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs 
>>>>> are primarily in 2 formats. Tesseract does quite well on one of the 
>>>>> formats 
>>>>> but the OCR text 
>>>>> for the second format is pretty much useless. I have attached the 
>>>>> image that Tesseract is unable to OCR. If someone is able to report any 
>>>>> success with OCRing this image 
>>>>> I would really appreciate it. So far I have tried the following but 
>>>>> they do not help with the OCR results.
>>>>>
>>>>> 1. Cropping the image
>>>>> 2. Reducing the height and width of the image with same/different 
>>>>> aspect ratio
>>>>> 3. Binarizing the image into black and white
>>>>> 4. Filtering the image to smoothen the image. 
>>>>>
>>>>> I haven't tried augmenting the training data set yet. The font seems 
>>>>> to be pretty standard (Lucida) and my understanding is that unless the 
>>>>> fonts are non-standard 
>>>>> augmenting the training data will not be very useful. 
>>>>>
>>>>> Your help/suggestions will be greatly appreciated. 
>>>>>
>>>>> Thank you,
>>>>> Amit Rao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/144c2438-f161-4903-8863-47c6c831c340%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to