try the finetuned traineddata from https://github.com/Shreeshrii/tessdata_shreetest/commit/0108263ad0c4c9bd11e0c8190a81fb36e2e4e56a
On Sat, Mar 30, 2019 at 1:47 AM Martin Emmerson <[email protected]> wrote: > Yikes! Thanks for the reply, but I could barely follow the discussion on > that pull request. It seems the answer at least for now is that there > isn't a straightforward way to restrict character set without being > somewhat familiar with the code base and dev environment (which I'm not). > Thanks anyway; I'll try to figure out some external workarounds. > > On Thursday, March 28, 2019 at 11:03:59 PM UTC-7, shree wrote: >> >> See https://github.com/tesseract-ocr/tesseract/pull/2294 >> >> On Fri, 29 Mar 2019, 11:17 Martin Emmerson, <[email protected]> wrote: >> >>> Is there a way to restrict the character set that tesseract-ocr will >>> attempt to identify? I'm scanning USA-based receipts which have a fairly >>> simple set of monospaced characters but, for example, often '1' will get >>> misidentified as '|', and a whole host of other simple substitution >>> errors. If I could just restrict tesseract to [-a-zA-Z0-9,.$()/] it would >>> be an immediate boost to accuracy. (Hoping for a way that doesn't involved >>> having to retrain from scratch on the limited set.) >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/2180d37f-50fd-47e6-9f48-c3ff73b1569e%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/df5177e4-32d0-4015-a863-02878ef53f9b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXLTNbXFz5mFNoHnurW8orWyudiA0FCOqF_tq_gVoONAA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

