I let process your image with the same parameters and it works well, tesseract 5.3 .
Le mar. 2 avr. 2024 à 16:12, Jean-Marc Spaggiari <[email protected]> a écrit : > Oh, interesting! Thanks for the suggestion. > > It works well to add the C indeed. However, when I do that, it's confusing > a 0 for a 9 on another example :( > > [image: output6.png] > > I get C 9135 with the 'fra' option and 0135 without. > > I built a small application to split the letters one by one and to run > them individually through tesseract and I get C 0135 correctly. But it > fails with other images. I'm wondering what's wrong with my input picture > :-/ > > JMS > > > Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, [email protected] a > écrit : > >> Hi Jean-Marc, >> I do test your picture with French language parameter : --psm 6 -l >> 'fra' it works well. >> With the english language -l eng effectively the C is dropped. >> In fact the C is viewed as a euro sign (€). >> Hope it help >> >> Best regards >> René >> >> >> >> >> >> Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <[email protected]> >> a écrit : >> >>> Hi, >>> >>> I'm trying to OCR short words in the form of a letter, a space, 4 >>> numbers. >>> >>> I'm doing a lot of pre-processing to get the picture cleaned and so far >>> I arrive to something like that: >>> [image: output6.png] >>> My challenge is that tesseract is only detecting the numbers. I tried >>> all the posisble PSM with the same result. The heading C is always ignored. >>> >>> This is the command line that I am running: >>> tesseract -c tessedit_char_whitelist=" >>> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout >>> >>> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same >>> result. >>> >>> I'm looking for some recommendations on what I can do better to help >>> tesseract detecting the heading C correctly. >>> >>> Thanks, >>> >>> JMS >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rAy%2B2CwzgWupjqk5kX8u2AhfUmnp1LD8OHAbigCDQmMA%40mail.gmail.com.

