Is there a way to restrict the character set that tesseract-ocr will attempt to identify? I'm scanning USA-based receipts which have a fairly simple set of monospaced characters but, for example, often '1' will get misidentified as '|', and a whole host of other simple substitution errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would be an immediate boost to accuracy. (Hoping for a way that doesn't involved having to retrain from scratch on the limited set.)
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

