Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2020-01-28 Thread Shree Devi Kumar
Please see https://github.com/Shreeshrii/tesstrain-ckb This is for finetune training from script/Arabic, using text and fonts. You would need to do steps similar to https://github.com/Shreeshrii/tesstrain-ckb/blob/master/0-setup.sh

Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2020-01-27 Thread manu pranay
shree, can you please help me out how to perform arabic training on tesseract 4. thank you On Thursday, May 4, 2017 at 3:22:42 PM UTC+5:30, shree wrote: > > Ibr, > > You are incorrect in your description of LSTM training. > > What you are doing will use the ara.traineddata provided in the repo,

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
replied to it On Thursday, May 4, 2017 at 3:06:34 PM UTC+3, Ahmad Moawad wrote: > > check ur email > > On Thursday, May 4, 2017 at 1:51:04 PM UTC+2, Ibr wrote: >> >> ibr.h...@gmail.com >> >> On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: >>> >>> Ibr give me your email! >>> >>>

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >> images I fount that some of the

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >> images I fount that some of the

Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
i shree actually I saw the section that was talking about lstmtraining, but I what I said was the result of following the tesseract messages, what happened from the beginning was that I used to train .traineddata files for English, and worked fine, but for Arabic it was failing, so I saw the

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ahmad Moawad
check ur email On Thursday, May 4, 2017 at 1:51:04 PM UTC+2, Ibr wrote: > > ibr.h...@gmail.com > > On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: >> >> Ibr give me your email! >> >> On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >>> >>> while I was creating lstmf

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >> images I fount that some of the

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ahmad Moawad
Ibr give me your email! On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: > > while I was creating lstmf files to I can use them in recognition text > images I fount that some of the characters are recognized in a wrong way, > some of them are not integrated in the tesseract and some

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
while I was creating lstmf files to I can use them in recognition text images I fount that some of the characters are recognized in a wrong way, some of them are not integrated in the tesseract and some them are due to some writing in certain Arabic itself, in this case the tesseract acts

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ahmad Moawad
for jTessBoxEditor 2.0 I tried it, but I didn't get any result !! for your question How much training set is sufficient to have best results for a new font e.g how many tiff pages. I think this was mention in Wiki: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 On

Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread ShreeDevi Kumar
Ibr, You are incorrect in your description of LSTM training. What you are doing will use the ara.traineddata provided in the repo, there will be no change in output. Once lstmf files are created, you have to run lstmtraining which will run for days/weeks to give you a good result. Please read

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ahmad Moawad
My Scenario is related to make training from images not from text base, I want to finetune characters such as: لمجرد not ملجرد and soon on On Thursday, May 4, 2017 at 11:28:13 AM UTC+2, Ibr wrote: > > if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if > you compiled

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if you compiled them in the correct way and got the binaries that you need for training lmstf files, then I recommend to follow the suggestions that is made by tesseract devs which is: once you create an .lstmf file for a

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-01 Thread bmwmine
> > I think jTessBoxEditor 2.0 has been updated to include Tesseract 4.00dev. >> > > 1- Could any body confirm because I am not getting better results for >> Arabic using it. >> > 2- How much training set is sufficient to have best results for a new font e.g how many tiff pages. -- You

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-04-10 Thread Quan Nguyen
jTessBoxEditor 2.0 beta versions bundle the latest Tesseract 4.00alpha training executable. The training process for 4.00, however, has not been integrated to the program. The 3.0x training process is still supported. Check out the two videos that depict the 3.0x training process: