Please see https://github.com/Shreeshrii/tesstrain-ckb

This is for finetune training from script/Arabic, using text and fonts.

You would need to do steps similar to

https://github.com/Shreeshrii/tesstrain-ckb/blob/master/0-setup.sh
https://github.com/Shreeshrii/tesstrain-ckb/blob/master/2-txt2img.sh
https://github.com/Shreeshrii/tesstrain-ckb/blob/master/3-img2lstmf.sh
https://github.com/Shreeshrii/tesstrain-ckb/blob/master/4-train-layer.sh




On Tue, Jan 28, 2020 at 12:08 PM manu pranay <pranaymanu3...@gmail.com>
wrote:

> shree,
> can you please help me out how to perform arabic training on tesseract 4.
>
> thank you
>
>
> On Thursday, May 4, 2017 at 3:22:42 PM UTC+5:30, shree wrote:
>>
>> Ibr,
>>
>> You are incorrect in your description of LSTM training.
>>
>> What you are doing will use the ara.traineddata provided in the repo,
>> there will be no change in output.
>>
>> Once lstmf files are created, you have to run lstmtraining which will run
>> for days/weeks  to give you a good result.
>>
>> Please read about LSTM training on wiki.
>>
>> On May 4, 2017 2:58 PM, "Ibr" <ibr....@gmail.com> wrote:
>>
>>> if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and
>>> if you compiled them in the correct way and got the binaries that you need
>>> for training lmstf files, then I recommend to follow the suggestions that
>>> is made by tesseract devs which is: once you create an .lstmf file for a
>>> certain font (that can be used for Arabic writing) then get the official
>>> ara.traineddata file from GitHub paste it in tessdata folder, and the lstmf
>>> file in tesseract folder and run the command  tesseract text_image
>>> result_text -l ara --oem 1
>>> what Arabic characters exactly are you trying to enhance the accuracy
>>> for ?
>>>
>>> On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I want to make training for Arabic language in Tesseract 4.0, and The
>>>> result of this version is great but still need some tunning, so I got
>>>> jTessBoxEditor 2.0 beta.
>>>> I tried to modify the incorrect characters and build ara.traineddata.
>>>> After copying the ara.traineddata to
>>>> /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I run
>>>> the tesseract on the image.
>>>> So any suggestion of how making training for Version 4.0, I already
>>>> know that that last version 3.0x cube doesn't included in 4.0 LSTM or
>>>> waiting until Ray makes another updated ara.traineddata.
>>>>
>>>> ,Thanks.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/1c842b1e-1dc1-418b-a5b7-368c11e7dfa5%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/1c842b1e-1dc1-418b-a5b7-368c11e7dfa5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/7bf66a4e-f85f-4b87-bf82-5688cb2cac8a%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/7bf66a4e-f85f-4b87-bf82-5688cb2cac8a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUK2tnsBAKytr3Uxtx_c8g4pNSqWTUWo5Bi_ZgwCKyOLw%40mail.gmail.com.

Reply via email to