eight datafiles have to be generated. Please visit wiki website of tesseract where how to generate datafiles are explained in detail.AT present tesseract supports for left to right. In case if you suceeded to generate datafiles, you hsve to read opposite direction i.e. left to right. cheers
On Mon, Nov 3, 2008 at 12:53 PM, Qurat-ul-Ain Akram <[EMAIL PROTECTED]>wrote: > Hi all > > I am working with the Urdu OCR. I came to know about Tesseract. I tried to > train tesseract for the Urdu characters. In the training procedure's > instruction , it is written that it cannot support the right to left writing > style. I myself tried to training the simple alphabets of Urdu as follows: > > 1 I made the characters txt file with name UrduCharacters.txt with > utf8 encoding > 2. Then from it TIF image is obtained and saved as UrduCharacters.tif > 3 Run the tesseract command to makebox file > *1 tesseract UrduCharacters.tif UrduCharacters > batch.nochop makebox* > > > 2 *tesseract UrduCharacters.tif UrduCharacters -l urd > batch.nochop > makebox* > I have tried the both the commands for training . In the second one the > error occurs indicating the message that "Unable to locate Urdunichaset > file" > In the second one the boxfile is generated with four character which are > ~, 7,7,! . If anyone has any idea about it please let me know. > > > Regards > Ainie > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

