Re: [tesseract-ocr] Incorrect recognition of Latin words inside Arabic text

Zdenko Podobny Fri, 02 Sep 2022 11:35:26 -0700

Please stop abusing the tesseract forum. Why are you sending the same email
again and again?


Zdenko


pi 2. 9. 2022 o 20:24 Naourass Derouichi <[email protected]> napísal(a):

> Hi all, I'm trying to ocr images similar to the attached one, but the
> error rate of Latin words is too high.
>
> I tried all PSMs with the following models from tessdata_best: *ara*,
> *eng*, *fra*, *Ara (*in different orders)*. *I even tried finetuning them
> on the font used in the input images.
>
> *Sample output (error in bold):*
> قرارلمجلس المنافسة عدد 0028/ق/2022 صبادر25 من شعبان 1443
> (28 مارس 2022) والمتعلق بتولي الشركة القابضة للمساهمات
> والاستثمارات *«11010108-:2م1]»* للمر اقبة المشتركة على شركة
> ‎«CMGP Group Sa»‏ وذلك عبراقتناء نسبة14,81 96 من أسيم
> رأسمالها وحقوق التصويت المرتبطة به.
>
> The results often have incorrect recognition of Latin words. Is there any
> solution to this issue?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5610a81a-a1d9-4b0d-bbc5-1c2cd60d4239n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5610a81a-a1d9-4b0d-bbc5-1c2cd60d4239n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yQ996Z_Sq%2BxA6fYY54H4niowuxzjAC5PnC8wK-HQh7Aw%40mail.gmail.com.

Re: [tesseract-ocr] Incorrect recognition of Latin words inside Arabic text

Reply via email to