Psm sorry - page segmentation mode

Sent from my iPhone

> On 21 Aug 2015, at 02:48, Amit Rao <rao.a...@gmail.com> wrote:
> 
> psr? 
> 
>> On Thursday, August 20, 2015 at 3:02:02 PM UTC-4, Allistair C wrote:
>> Try different psr too - I got close with psr 6
>> 
>> Sent from my iPhone
>> 
>>> On 20 Aug 2015, at 19:59, Amit Rao <rao....@gmail.com> wrote:
>>> 
>>> Thanks. I tried scaling the image horizontally trying different widths and 
>>> heights and the best Tesseract could do for 09:43 AM was 
>>> ocr string = @9243 RH
>>> 
>>> 
>>> 
>>> I'll check out the threads on LCD/clock type reading. Thanks for the 
>>> pointer. 
>>> 
>>> 
>>> 
>>> -amit
>>> 
>>> 
>>>> On Thursday, August 20, 2015 at 8:09:58 AM UTC-4, Allistair C wrote:
>>>> So another thing you could try ... I notice that everything is 
>>>> horizontally compressed. You could try scaling the image horizontally only 
>>>> to stretch things out (like I attach). 
>>>> 
>>>> This would then make the problem similar to those looking to read e.g. 
>>>> digital clock text - there are a variety of threads on this group about 
>>>> LCD/clock type reading that may then reveal further things you could do 
>>>> from that point.
>>>> 
>>>> 
>>>> 
>>>>> On 20 August 2015 at 13:03, Allistair <alli...@gmail.com> wrote:
>>>>> The font does not look like that - look the shape of the 0 which has a 
>>>>> strikethrough in your image but not in Lucinda of the M shape. I am not 
>>>>> sure font training will do a lot here, I think it's more the quality of 
>>>>> the edges in your image due to the dot matrix printing or however it's 
>>>>> printed producing uncertain edges. 
>>>>> 
>>>>> Perhaps others can chip in.
>>>>> 
>>>>>> On 20 August 2015 at 10:31, Amit Rao <rao....@gmail.com> wrote:
>>>>>> Thanks, Allistair. I was guessing that this font was similar to Lucida 
>>>>>> Console. e.g.
>>>>>> 
>>>>>> https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A
>>>>>> 
>>>>>> However, I don't know for certain what font this is and I don't know of 
>>>>>> a tool that will help me know for sure which font the image uses. The 
>>>>>> only text I am really interested in is "HH:MM AM/PM" but if I crop the 
>>>>>> image to include only the time Tesseract is still not able to read it 
>>>>>> similar to what you reported.. I cropped the image to include 09:43 AM 
>>>>>> and it reads it as  @9243 Rh
>>>>>> 
>>>>>> If this is a font that Tesseract does not recognize would it help 
>>>>>> augmenting the training data set with data from images with this format 
>>>>>> and font? 
>>>>>> 
>>>>>> Thanks,
>>>>>> amit
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote:
>>>>>>> Which Lucinda font do you think this is? All Lucinda fonts I see in a 
>>>>>>> Google Image search are nothing like this.
>>>>>>> 
>>>>>>> You're right, this does not OCR well. In fact, if you just crop out a 
>>>>>>> part of it to remove other noise, say, 09:43 AM, even with lots of 
>>>>>>> margin Tesseract isn't even finding anything it thinks looks like text 
>>>>>>> in normal page segmentation.
>>>>>>> 
>>>>>>> The best I got (for the cropped out time) was:
>>>>>>> 
>>>>>>> 39:43 HH
>>>>>>> 
>>>>>>> So 28% incorrect.
>>>>>>> 
>>>>>>> The definition of the 'M' is quite eroded already which is not great.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 August 2015 at 08:29, Amit Rao <rao....@gmail.com> wrote:
>>>>>>>> HI folks, 
>>>>>>>> 
>>>>>>>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs 
>>>>>>>> are primarily in 2 formats. Tesseract does quite well on one of the 
>>>>>>>> formats but the OCR text 
>>>>>>>> for the second format is pretty much useless. I have attached the 
>>>>>>>> image that Tesseract is unable to OCR. If someone is able to report 
>>>>>>>> any success with OCRing this image 
>>>>>>>> I would really appreciate it. So far I have tried the following but 
>>>>>>>> they do not help with the OCR results.
>>>>>>>> 
>>>>>>>> 1. Cropping the image
>>>>>>>> 2. Reducing the height and width of the image with same/different 
>>>>>>>> aspect ratio
>>>>>>>> 3. Binarizing the image into black and white
>>>>>>>> 4. Filtering the image to smoothen the image. 
>>>>>>>> 
>>>>>>>> I haven't tried augmenting the training data set yet. The font seems 
>>>>>>>> to be pretty standard (Lucida) and my understanding is that unless the 
>>>>>>>> fonts are non-standard 
>>>>>>>> augmenting the training data will not be very useful. 
>>>>>>>> 
>>>>>>>> Your help/suggestions will be greatly appreciated. 
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> Amit Rao
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com.
>>>>>> 
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/144c2438-f161-4903-8863-47c6c831c340%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/27612917-16d7-48d3-a367-3efc253e548c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ACC500CB-361B-48D9-9E8A-6A1283AAB19D%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to