[tesseract-ocr] Re: Extraction of English and Thai text from documents

Piyush Chandra Thu, 28 May 2020 00:23:08 -0700

1. There has been always a problem with tables with tesseract. I would 
suggest you to remove the tables and do some pre processing of image like 
upscaling, threshold, grey scale, etc to improve accuracy.


2. Try posting you sample images and results for better reply.

On Monday, 18 May 2020 00:09:45 UTC+5:30, Prateek wrote:
>
>
> I have a bunch of documents which contain text in both English and Thai 
> languages and is structured in tabular / form type manner. Some of the 
> issues that I'm facing while running tesseract with lang = "eng+thai" are :
>
> 1. The OCR is reading thai as english and english as thai as it doesnt 
> detect multiple languages in one line. I've tried different psm modes but 
> its still failing to differentiate between english and thai in a lot of 
> cases.
>
> 2. The text in the document is small and upscaling the document 
> deteriorates the quality even further. How should I handle such a case ?
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a514d56e-c098-4544-b45a-46874b12c31a%40googlegroups.com.

[tesseract-ocr] Re: Extraction of English and Thai text from documents

Reply via email to