Hi Jean-Marc,
I do test your picture with French language  parameter : --psm 6    -l
'fra'       it works well.
With the english language  -l eng  effectively the C is dropped.
In fact the C is viewed as a euro sign   (€).
Hope it help

Best regards
René





Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <[email protected]>
a écrit :

> Hi,
>
> I'm trying to OCR short words in the form of a letter, a space, 4 numbers.
>
> I'm doing a lot of pre-processing to get the picture cleaned and so far I
> arrive to something like that:
> [image: output6.png]
> My challenge is that tesseract is only detecting the numbers. I tried all
> the posisble PSM with the same result. The heading C is always ignored.
>
> This is the command line that I am running:
> tesseract -c tessedit_char_whitelist="
> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout
>
> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same
> result.
>
> I'm looking for some recommendations on what I can do better to help
> tesseract detecting the heading C correctly.
>
> Thanks,
>
> JMS
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_of%2B_LH8vW6h_Niqk6NmJXP-S395g2zDnkNpABkzoL%2B%3Dg%40mail.gmail.com.

Reply via email to