Is there a way to restrict the character set that tesseract-ocr will 
attempt to identify?  I'm scanning USA-based receipts which have a fairly 
simple set of monospaced characters but, for example, often '1' will get 
misidentified as '|', and a whole host of other simple substitution 
errors.  If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would 
be an immediate boost to accuracy.  (Hoping for a way that doesn't involved 
having to retrain from scratch on the limited set.)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to