olo company i am trying to ocr an old (1963) morocco arabic - english dictionary
i have tried jTessBoxEditor for ocr, somehow managed to follow the info on net, but at the very end tesseract failed to make final _traindata_ files my problem is the book (dictionary) is basically in english language, so i used eng file for ocr-ing but there is also transliteration text, which includes characters that are not present in english language although they are latin script i tried to train the tesseract for those characters, but failed ie from this link: https://www.youtube.com/watch?v=8GdcyknL1ls the other info i could find is also a bit confusing the characters i was trying to train are letters g z d h r t s l - with dots below and above, plus š ž and a weird semi question mark transliteration script is also _italic_ with help of libre office writer and some trial & error i also managed to identify a (close approximation) of the transliteration font (Latin Modern Roman Unslanted) can somebody versed in tesseract-ocr training help me train (or do the ocr) for those letters/characters ? attached are: - my train script / font image (font - latin modern roman unslanted) - a page from a dictionary which includes most of the characters i am trying to ocr dictionary has 500+ pages, half is eng-morocco arabic, the other half is morocco arabic-eng, so proper ocr would be truly appreciated thank you for your help have fun aum -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c1b2a694-8d05-4b06-b06f-ecbc27c13ea4n%40googlegroups.com.

