[tesseract-ocr] Recognizing blurred dots as CJK characters

Jeetendra Ahuja Mon, 25 Nov 2019 06:44:33 -0800

So before processing a document, we want to rejects ones which are CJK so 
I've used Tesseract for this.. It does pretty good job but some times when 
document quality is low then from "Table of Contents" page, most of the 
dots are recognized as "CJK" characters. I am planning to create own 
training data but wanted to get advice from experts.


*Config:*

   - Tesseract 4.0
   - instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB");
   - instance.setOcrEngineMode(1);


Image is zoomed to 600% in Adobe PDF reader.

Please let me know.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com.

[tesseract-ocr] Recognizing blurred dots as CJK characters

Reply via email to