You are correct, I did miss that section. Inverting the image seems to produce better results.
I think the fact that the images are simple and that the resulting text was not even close had me in the mindset that it wasn't a quality problem as much as an option I was missing somehow, so I was looking for something like that. Anyway, thank you for pointing it out. On Sunday, August 10, 2025 at 3:44:50 PM UTC-4 zdenop wrote: > Seems like you miss this > https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md... > > Zdenko > > > ne 10. 8. 2025 o 21:16 Thomas McGrew <[email protected]> napísal(a): > >> I had looked through that some, but I looked again and I don't see >> anything in the documentation that addresses this problem. Is there >> something in particular in the documentation that I should read? >> >> I know how to install the application, as I have already done so, I know >> how to run it - I'm mostly using it via pyocr, but the command line gives >> the same result. I have the models for OSD, English and Japanese installed. >> I have run tesseract on thousands of images like this, and 99% of the time >> it works fine. >> >> If the model sometimes hallucinates and there is nothing to be done then >> that's fine and just something I'll have to work around. I did find that >> scaling an image to a different size does generally make tesseract read the >> text correctly when this happens, for whatever reason. >> On Sunday, August 10, 2025 at 5:18:53 AM UTC-4 zdenop wrote: >> >>> https://github.com/tesseract-ocr/tessdoc >>> >>> Zdenko >>> >>> >>> ne 10. 8. 2025 o 10:25 Thomas McGrew <[email protected]> napísal(a): >>> >>>> I read the man page and the command line help, unless you're referring >>>> to some other documentation, then yes I read it. >>>> >>>> Thomas McGrew >>>> >>>> On Sun, Aug 10, 2025, 03:41 Thomas McGrew <[email protected]> wrote: >>>> >>>>> I'm trying to understand why tesseract is detecting this text >>>>> incorrectly. >>>>> >>>>> --oem 0 has issues with italics, so I've been using --oem 1, however >>>>> on this one image (that I've noticed so far), it seems to be totally >>>>> incorrect. >>>>> >>>>> The image clearly contains only the text "'Kaay." >>>>> Yet tesseract reads the text with --oem 1 as "LECEVA" >>>>> --oem 0 does read the text correctly. >>>>> >>>>> I'm using the default psm of 3, but no others I have tried seem to >>>>> read the text correctly. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "tesseract-ocr" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/tesseract-ocr/TRLTSbSg_30/unsubscribe >>>>> . >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected]. >>>>> To view this discussion visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/017ef73f-c695-4a06-819a-9f2b46ab3e89n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/017ef73f-c695-4a06-819a-9f2b46ab3e89n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> >>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> >>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CAM3xfkfm0iXN_ZAmdu84vqEuwQ1a3GF6wwGd5wL-AiMNPONUTg%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAM3xfkfm0iXN_ZAmdu84vqEuwQ1a3GF6wwGd5wL-AiMNPONUTg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion visit >> https://groups.google.com/d/msgid/tesseract-ocr/f3d99941-39ed-499c-8bd1-ad79d437c959n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f3d99941-39ed-499c-8bd1-ad79d437c959n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/718b7ed7-0298-4c6e-8a59-d101d9d7221cn%40googlegroups.com.

