Yikes! Thanks for the reply, but I could barely follow the discussion on that pull request. It seems the answer at least for now is that there isn't a straightforward way to restrict character set without being somewhat familiar with the code base and dev environment (which I'm not). Thanks anyway; I'll try to figure out some external workarounds.
On Thursday, March 28, 2019 at 11:03:59 PM UTC-7, shree wrote: > > See https://github.com/tesseract-ocr/tesseract/pull/2294 > > On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected] > <javascript:>> wrote: > >> Is there a way to restrict the character set that tesseract-ocr will >> attempt to identify? I'm scanning USA-based receipts which have a fairly >> simple set of monospaced characters but, for example, often '1' will get >> misidentified as '|', and a whole host of other simple substitution >> errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would >> be an immediate boost to accuracy. (Hoping for a way that doesn't involved >> having to retrain from scratch on the limited set.) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

