[tesseract-ocr] Training tesseract for arabic language

Mohamed Chebbi Fri, 03 Jun 2016 07:09:57 -0700

Hi all !

I tried the source training data and the result is not good at all
That's why i try to define my own ara.traineddata. i want as you did : 
trying to get an ArabicOCR which can have a very good accuracy.
In the tutorial they said you can define your most used words in a text 
file in your specific language. When i try to open ara.cube-word-dawg with 
notepad++ , i can't see anything because as Quan explain me is based in 
binary code .
So how i build my own ara.cube-word-dawg ?  because when i use 
tesseract-ocr with the folder tessdata containing all the languages and 
even i define the language ara : when i put an arabic document in the input 
the result is false : not even a lonely character recognized in Arabic .
That's why i try to define my own file ara.traineddata : i already create 
the .box,font_properties,inttemp,normproto,pffmtable,shapetable,unicharset 
and i'm missing the dictionary data .


Am i right in with this steps that i took ?
Why the arabic is not optimized in tesseract ?

Regards

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9bbd6b37-4c22-41d7-b717-7154331f96bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Training tesseract for arabic language

Reply via email to