[tesseract-ocr] Extraction of English and Thai text from documents

Prateek Sun, 17 May 2020 11:40:24 -0700

I have a bunch of documents which contain text in both English and Thai 
languages and is structured in tabular / form type manner. Some of the 
issues that I'm facing while running tesseract with lang = "eng+thai" are :


1. The OCR is reading thai as english and english as thai as it doesnt 
detect multiple languages in one line. I've tried different psm modes but 
its still failing to differentiate between english and thai in a lot of 
cases.

2. The text in the document is small and upscaling the document 
deteriorates the quality even further. How should I handle such a case ?




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2c9b3fb8-6879-4c0b-9d03-e285eaad9fd9%40googlegroups.com.

[tesseract-ocr] Extraction of English and Thai text from documents

Reply via email to