Thanks. I tried scaling the image horizontally trying different widths and heights and the best Tesseract could do for 09:43 AM was
*ocr string = @9243 RH* I'll check out the threads on LCD/clock type reading. Thanks for the pointer. -amit On Thursday, August 20, 2015 at 8:09:58 AM UTC-4, Allistair C wrote: > > So another thing you could try ... I notice that everything is > horizontally compressed. You could try scaling the image horizontally only > to stretch things out (like I attach). > > This would then make the problem similar to those looking to read e.g. > digital clock text - there are a variety of threads on this group about > LCD/clock type reading that may then reveal further things you could do > from that point. > > > > On 20 August 2015 at 13:03, Allistair <alli...@gmail.com <javascript:>> > wrote: > >> The font does not look like that - look the shape of the 0 which has a >> strikethrough in your image but not in Lucinda of the M shape. I am not >> sure font training will do a lot here, I think it's more the quality of the >> edges in your image due to the dot matrix printing or however it's printed >> producing uncertain edges. >> >> Perhaps others can chip in. >> >> On 20 August 2015 at 10:31, Amit Rao <rao....@gmail.com <javascript:>> >> wrote: >> >>> Thanks, Allistair. I was guessing that this font was similar to Lucida >>> Console. e.g. >>> >>> >>> https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A >>> >>> However, I don't know for certain what font this is and I don't know of >>> a tool that will help me know for sure which font the image uses. The only >>> text I am really interested in is "HH:MM AM/PM" but if I crop the image to >>> include only the time Tesseract is still not able to read it similar to >>> what you reported.. I cropped the image to include 09:43 AM and it reads it >>> as *@9243 Rh* >>> >>> If this is a font that Tesseract does not recognize would it help >>> augmenting the training data set with data from images with this format and >>> font? >>> >>> Thanks, >>> amit >>> >>> >>> >>> On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote: >>>> >>>> Which Lucinda font do you think this is? All Lucinda fonts I see in a >>>> Google Image search are nothing like this. >>>> >>>> You're right, this does not OCR well. In fact, if you just crop out a >>>> part of it to remove other noise, say, 09:43 AM, even with lots of margin >>>> Tesseract isn't even finding anything it thinks looks like text in normal >>>> page segmentation. >>>> >>>> The best I got (for the cropped out time) was: >>>> >>>> 39:43 HH >>>> >>>> So 28% incorrect. >>>> >>>> The definition of the 'M' is quite eroded already which is not great. >>>> >>>> >>>> >>>> On 20 August 2015 at 08:29, Amit Rao <rao....@gmail.com> wrote: >>>> >>>>> HI folks, >>>>> >>>>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs >>>>> are primarily in 2 formats. Tesseract does quite well on one of the >>>>> formats >>>>> but the OCR text >>>>> for the second format is pretty much useless. I have attached the >>>>> image that Tesseract is unable to OCR. If someone is able to report any >>>>> success with OCRing this image >>>>> I would really appreciate it. So far I have tried the following but >>>>> they do not help with the OCR results. >>>>> >>>>> 1. Cropping the image >>>>> 2. Reducing the height and width of the image with same/different >>>>> aspect ratio >>>>> 3. Binarizing the image into black and white >>>>> 4. Filtering the image to smoothen the image. >>>>> >>>>> I haven't tried augmenting the training data set yet. The font seems >>>>> to be pretty standard (Lucida) and my understanding is that unless the >>>>> fonts are non-standard >>>>> augmenting the training data will not be very useful. >>>>> >>>>> Your help/suggestions will be greatly appreciated. >>>>> >>>>> Thank you, >>>>> Amit Rao >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/144c2438-f161-4903-8863-47c6c831c340%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.