Hello. I've trained tesseract for specific font on a carefully scaled and baseline aligned set of images. It works mostly well, but in some situations tesseract fails to recognize a single-line text.
For example, I have image [image: image1.png] Trying to recognize it using PSM 7 and get complete garbage *1 P9ACT69HT9лpв A9P96BPi* But if take bound-box and baseline of recognized result [image: image2.png] and apply font metrics that was used to train tesseract (38=6+24+8 pixels for leading+ascent+descent) and crop input image to 38 pixels height (acording to baseline) [image: image3.png] and use PSM 13 (raw line), I get correct result *Пpeдcтaвитeль дepжaвы* So, my question is - wtf?! Tesseract correctly recognizes bounding box and baseline of the text, but produces complete garbase in PSM 7, and correct result in PSM 13. How to avoid double text line detection? Also, in many cases tesseract produces correct result with PSM 7 (as well as others modes, like 3, 4, etc.), but in manu cases it produces garbage, so I have to extract bounding boxes and baselines every time, just to check I've got correct result. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/c03f71a6-003e-4ea7-bc58-c2047301d0d2n%40googlegroups.com.

