[image: KoreanOCRExample.PNG]

Hi all, 

as a Tesseract/OCR newbie, I am currently working on deepening my 
understanding of the Tesseract foundations and OCR basics.
This is why I came across the following strange results: When scanning some 
Korean Wikipedia pages (related to mathematics), Tesseract is completely 
unable to identify a correct set of records in this case alone.
Most of the other sample test passages showed no problems, and even 
pre-processing had no impact on the results. So what am I missing here ? 
Are Korean language corporas so rare that the NN hasnt seen these 
particular letters ?
I will add the scanned passage. Thank you in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b38b3f40-2e4e-4884-870c-d8e0d867b073n%40googlegroups.com.

Reply via email to