Hi all ! I tried the source training data and the result is not good at all That's why i try to define my own ara.traineddata. i want as you did : trying to get an ArabicOCR which can have a very good accuracy. In the tutorial they said you can define your most used words in a text file in your specific language. When i try to open ara.cube-word-dawg with notepad++ , i can't see anything because as Quan explain me is based in binary code . So how i build my own ara.cube-word-dawg ? because when i use tesseract-ocr with the folder tessdata containing all the languages and even i define the language ara : when i put an arabic document in the input the result is false : not even a lonely character recognized in Arabic . That's why i try to define my own file ara.traineddata : i already create the .box,font_properties,inttemp,normproto,pffmtable,shapetable,unicharset and i'm missing the dictionary data .
Am i right in with this steps that i took ? Why the arabic is not optimized in tesseract ? Regards -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9bbd6b37-4c22-41d7-b717-7154331f96bf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

