I let process your image with the  same parameters and it works well,
tesseract 5.3 .

Le mar. 2 avr. 2024 à 16:12, Jean-Marc Spaggiari <[email protected]>
a écrit :

> Oh, interesting! Thanks for the suggestion.
>
> It works well to add the C indeed. However, when I do that, it's confusing
> a 0 for a 9 on another example :(
>
> [image: output6.png]
>
> I get C 9135 with the 'fra' option and 0135 without.
>
> I built a small application to split the letters one by one and to run
> them individually through tesseract and I get C 0135 correctly. But it
> fails with other images. I'm wondering what's wrong with my input picture
> :-/
>
> JMS
>
>
> Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, [email protected] a
> écrit :
>
>> Hi Jean-Marc,
>> I do test your picture with French language  parameter : --psm 6    -l
>> 'fra'       it works well.
>> With the english language  -l eng  effectively the C is dropped.
>> In fact the C is viewed as a euro sign   (€).
>> Hope it help
>>
>> Best regards
>> René
>>
>>
>>
>>
>>
>> Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <[email protected]>
>> a écrit :
>>
>>> Hi,
>>>
>>> I'm trying to OCR short words in the form of a letter, a space, 4
>>> numbers.
>>>
>>> I'm doing a lot of pre-processing to get the picture cleaned and so far
>>> I arrive to something like that:
>>> [image: output6.png]
>>> My challenge is that tesseract is only detecting the numbers. I tried
>>> all the posisble PSM with the same result. The heading C is always ignored.
>>>
>>> This is the command line that I am running:
>>> tesseract -c tessedit_char_whitelist="
>>> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout
>>>
>>> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same
>>> result.
>>>
>>> I'm looking for some recommendations on what I can do better to help
>>> tesseract detecting the heading C correctly.
>>>
>>> Thanks,
>>>
>>> JMS
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rAy%2B2CwzgWupjqk5kX8u2AhfUmnp1LD8OHAbigCDQmMA%40mail.gmail.com.

Reply via email to