I tried using MarathiCursiveT Medium as font in fontlist and it worked.
Thanks for that.
It created traineddata and unicharset files in the destination folder.
I hope now I can continue with further instructions as mentioned at
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

box file is created using command    *tesseract A.png A lstmbox*
where A.png is the image with modi characters.


On Tue, Jan 28, 2020 at 12:28 PM Shree Devi Kumar <shreesh...@gmail.com>
wrote:

>
> *MarthiCursiveT Medium*
> *Use the above as the font with tesstrain.sh*
>
> *How are you creating the box file for the image?*
>
>
> On Tue, Jan 28, 2020, 21:56 'Nilambari Joshi' via tesseract-ocr <
> tesseract-ocr@googlegroups.com> wrote:
>
>> I was trying to do with image. I got one image online with all modi
>> script characters and tried to create Box file for that image.
>> In the box file I can see that it is considering each character as
>> English character.
>> *My question is how to make it realise that it should refer to it as a
>> modi character.*
>>
>> Then I tried to use tesstrain.sh as below
>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist
>> MarathiCursiveT --lang mar --linedata_only --noextract_font_properties
>> --langdata_dir ../tesstutorial/langdata --tessdata_dir
>> ../tesstutorial/tesseract/tessdata --training_text
>> ../tesstutorial/langdata/mar/mar.modi.training_text --output_dir
>> ../tesstutorial/moditrain
>>
>> I got (by running make) MarathiCursiveT truetype Unicode modi font from
>> the link https://github.com/MihailJP/MarathiCursive, mentioned in
>> response to my query.
>> That file I kept at /usr/share/fonts/truetype/MarathiCursiveT
>>
>> I created mar.modi.training_text  by copying content of  marathi
>> training data text file in Aksharmukh app and taking output text in modi.
>>
>> *for tesstrain.sh I am getting error Could not find font named
>> 'MarathiCursiveT. Pango suggested font 'MarthiCursiveT Medium'*
>>
>> Please advise for both the queries.Thanks in advance
>>
>> On Monday, January 27, 2020 at 3:22:17 AM UTC-5, shree wrote:
>>>
>>> For LSTM training punc, numbers, wordlist are NOT required. You can add
>>> them if you like. Unicharset is generated from the training text.
>>>
>>> Are you planning to train from text or images?
>>>
>>> On Mon, Jan 27, 2020 at 2:19 AM 'Nilambari Joshi' via tesseract-ocr <
>>> tesser...@googlegroups.com> wrote:
>>>
>>>> Thanks for your response. I will work as suggested. Please also clarify
>>>> whether I need to create separate language directory for Modi similar to
>>>> Marathi with all files like number, punc wordlist included and a separate
>>>> unicharset file as well?
>>>> Thanks in advance.
>>>>
>>>> On Sunday, January 26, 2020 at 12:26:51 PM UTC-5, shree wrote:
>>>>>
>>>>> Thanks for the link to Modi Unicode font.
>>>>>
>>>>> I would convert the Marathi training text to Modi script (use
>>>>> Aksharamukha) and then train using the unicode font.
>>>>>
>>>>> On Sun, Jan 26, 2020 at 10:28 PM Patrick CHEW <patri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Jan 26, 2020, at 08:16, Shree Devi Kumar <shree...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Is there a Unicode font for modi script?
>>>>>>
>>>>>>
>>>>>> https://github.com/MihailJP/MarathiCursive
>>>>>>
>>>>>> On Sun, Jan 26, 2020, 21:22 'Nilambari Joshi' via tesseract-ocr <
>>>>>> tesser...@googlegroups.com> wrote:
>>>>>>
>>>>>>> Hi... I want to create Modi script (Marathi language) traineddata in
>>>>>>> tesseract for OCR. Can somebody guide what steps should I follow.
>>>>>>> I referred to
>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>>>>> but stuckup at a stage of creating box files.
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesser...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWX9WmC%3DXbVCRAM9qJd2UB65_QafyimqOg3X7GoVbbqfQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWX9WmC%3DXbVCRAM9qJd2UB65_QafyimqOg3X7GoVbbqfQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BK5eC-imXwE97yH8a-EdXksiDmDu_A-o%3DLORQJ_Y_q9pXqinw%40mail.gmail.com.

Reply via email to