I have a bunch of documents which contain text in both English and Thai languages and is structured in tabular / form type manner. Some of the issues that I'm facing while running tesseract with lang = "eng+thai" are :
1. The OCR is reading thai as english and english as thai as it doesnt detect multiple languages in one line. I've tried different psm modes but its still failing to differentiate between english and thai in a lot of cases. 2. The text in the document is small and upscaling the document deteriorates the quality even further. How should I handle such a case ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2c9b3fb8-6879-4c0b-9d03-e285eaad9fd9%40googlegroups.com.