Hi Jean-Marc, I do test your picture with French language parameter : --psm 6 -l 'fra' it works well. With the english language -l eng effectively the C is dropped. In fact the C is viewed as a euro sign (€). Hope it help
Best regards René Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <[email protected]> a écrit : > Hi, > > I'm trying to OCR short words in the form of a letter, a space, 4 numbers. > > I'm doing a lot of pre-processing to get the picture cleaned and so far I > arrive to something like that: > [image: output6.png] > My challenge is that tesseract is only detecting the numbers. I tried all > the posisble PSM with the same result. The heading C is always ignored. > > This is the command line that I am running: > tesseract -c tessedit_char_whitelist=" > 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout > > I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same > result. > > I'm looking for some recommendations on what I can do better to help > tesseract detecting the heading C correctly. > > Thanks, > > JMS > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_of%2B_LH8vW6h_Niqk6NmJXP-S395g2zDnkNpABkzoL%2B%3Dg%40mail.gmail.com.

