have you tried `osd` - orientation and script detection? On Mon, Nov 25, 2019 at 8:13 PM Jeetendra Ahuja < jeetendra.ahuja...@gmail.com> wrote:
> So before processing a document, we want to rejects ones which are CJK so > I've used Tesseract for this.. It does pretty good job but some times when > document quality is low then from "Table of Contents" page, most of the > dots are recognized as "CJK" characters. I am planning to create own > training data but wanted to get advice from experts. > > *Config:* > > - Tesseract 4.0 > - instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB"); > - instance.setOcrEngineMode(1); > > > Image is zoomed to 600% in Adobe PDF reader. > > Please let me know. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXMt_TYoq5jYSMgykEgpLLWycUXUSCqZfKjiYD19SwOQw%40mail.gmail.com.