Psm sorry - page segmentation mode Sent from my iPhone
> On 21 Aug 2015, at 02:48, Amit Rao <rao.a...@gmail.com> wrote: > > psr? > >> On Thursday, August 20, 2015 at 3:02:02 PM UTC-4, Allistair C wrote: >> Try different psr too - I got close with psr 6 >> >> Sent from my iPhone >> >>> On 20 Aug 2015, at 19:59, Amit Rao <rao....@gmail.com> wrote: >>> >>> Thanks. I tried scaling the image horizontally trying different widths and >>> heights and the best Tesseract could do for 09:43 AM was >>> ocr string = @9243 RH >>> >>> >>> >>> I'll check out the threads on LCD/clock type reading. Thanks for the >>> pointer. >>> >>> >>> >>> -amit >>> >>> >>>> On Thursday, August 20, 2015 at 8:09:58 AM UTC-4, Allistair C wrote: >>>> So another thing you could try ... I notice that everything is >>>> horizontally compressed. You could try scaling the image horizontally only >>>> to stretch things out (like I attach). >>>> >>>> This would then make the problem similar to those looking to read e.g. >>>> digital clock text - there are a variety of threads on this group about >>>> LCD/clock type reading that may then reveal further things you could do >>>> from that point. >>>> >>>> >>>> >>>>> On 20 August 2015 at 13:03, Allistair <alli...@gmail.com> wrote: >>>>> The font does not look like that - look the shape of the 0 which has a >>>>> strikethrough in your image but not in Lucinda of the M shape. I am not >>>>> sure font training will do a lot here, I think it's more the quality of >>>>> the edges in your image due to the dot matrix printing or however it's >>>>> printed producing uncertain edges. >>>>> >>>>> Perhaps others can chip in. >>>>> >>>>>> On 20 August 2015 at 10:31, Amit Rao <rao....@gmail.com> wrote: >>>>>> Thanks, Allistair. I was guessing that this font was similar to Lucida >>>>>> Console. e.g. >>>>>> >>>>>> https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A >>>>>> >>>>>> However, I don't know for certain what font this is and I don't know of >>>>>> a tool that will help me know for sure which font the image uses. The >>>>>> only text I am really interested in is "HH:MM AM/PM" but if I crop the >>>>>> image to include only the time Tesseract is still not able to read it >>>>>> similar to what you reported.. I cropped the image to include 09:43 AM >>>>>> and it reads it as @9243 Rh >>>>>> >>>>>> If this is a font that Tesseract does not recognize would it help >>>>>> augmenting the training data set with data from images with this format >>>>>> and font? >>>>>> >>>>>> Thanks, >>>>>> amit >>>>>> >>>>>> >>>>>> >>>>>>> On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote: >>>>>>> Which Lucinda font do you think this is? All Lucinda fonts I see in a >>>>>>> Google Image search are nothing like this. >>>>>>> >>>>>>> You're right, this does not OCR well. In fact, if you just crop out a >>>>>>> part of it to remove other noise, say, 09:43 AM, even with lots of >>>>>>> margin Tesseract isn't even finding anything it thinks looks like text >>>>>>> in normal page segmentation. >>>>>>> >>>>>>> The best I got (for the cropped out time) was: >>>>>>> >>>>>>> 39:43 HH >>>>>>> >>>>>>> So 28% incorrect. >>>>>>> >>>>>>> The definition of the 'M' is quite eroded already which is not great. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 20 August 2015 at 08:29, Amit Rao <rao....@gmail.com> wrote: >>>>>>>> HI folks, >>>>>>>> >>>>>>>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs >>>>>>>> are primarily in 2 formats. Tesseract does quite well on one of the >>>>>>>> formats but the OCR text >>>>>>>> for the second format is pretty much useless. I have attached the >>>>>>>> image that Tesseract is unable to OCR. If someone is able to report >>>>>>>> any success with OCRing this image >>>>>>>> I would really appreciate it. So far I have tried the following but >>>>>>>> they do not help with the OCR results. >>>>>>>> >>>>>>>> 1. Cropping the image >>>>>>>> 2. Reducing the height and width of the image with same/different >>>>>>>> aspect ratio >>>>>>>> 3. Binarizing the image into black and white >>>>>>>> 4. Filtering the image to smoothen the image. >>>>>>>> >>>>>>>> I haven't tried augmenting the training data set yet. The font seems >>>>>>>> to be pretty standard (Lucida) and my understanding is that unless the >>>>>>>> fonts are non-standard >>>>>>>> augmenting the training data will not be very useful. >>>>>>>> >>>>>>>> Your help/suggestions will be greatly appreciated. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Amit Rao >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an email to tesseract-oc...@googlegroups.com. >>>>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to tesseract-oc...@googlegroups.com. >>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com. >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/144c2438-f161-4903-8863-47c6c831c340%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/27612917-16d7-48d3-a367-3efc253e548c%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ACC500CB-361B-48D9-9E8A-6A1283AAB19D%40gmail.com. For more options, visit https://groups.google.com/d/optout.