Re: Need Help To Train Teseract for Urdu Language

74yrs old Mon, 03 Nov 2008 03:25:53 -0800

eight datafiles have to be generated.  Please visit wiki website of
tesseract  where how to generate datafiles are explained in detail.AT
present tesseract supports for left to right. In case if you suceeded to
generate datafiles, you hsve to read opposite direction i.e. left to right.
cheers


On Mon, Nov 3, 2008 at 12:53 PM, Qurat-ul-Ain Akram
<[EMAIL PROTECTED]>wrote:

> Hi all
>
> I am working  with the Urdu OCR. I came to know about Tesseract. I tried to
> train tesseract for the Urdu characters. In the training procedure's
> instruction , it is written that it cannot support the right to left writing
> style. I myself tried to training the simple alphabets of Urdu  as follows:
>
> 1      I made the characters txt file with name UrduCharacters.txt with
> utf8 encoding
> 2.     Then from it TIF image is obtained and saved as UrduCharacters.tif
> 3      Run the tesseract command to makebox file
>               *1   tesseract UrduCharacters.tif  UrduCharacters
> batch.nochop makebox*
>
>
>               2    *tesseract UrduCharacters.tif  UrduCharacters  -l urd 
> batch.nochop
> makebox*
> I have tried the both the commands for training . In the second one the
> error occurs indicating the message that "Unable to locate Urdunichaset
> file"
> In the second one the boxfile is generated with four character which are
>  ~, 7,7,! . If anyone has any idea about it please let me know.
>
>
> Regards
> Ainie
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Need Help To Train Teseract for Urdu Language

Reply via email to