*MarthiCursiveT Medium*
*Use the above as the font with tesstrain.sh*

*How are you creating the box file for the image?*


On Tue, Jan 28, 2020, 21:56 'Nilambari Joshi' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:

> I was trying to do with image. I got one image online with all modi script
> characters and tried to create Box file for that image.
> In the box file I can see that it is considering each character as English
> character.
> *My question is how to make it realise that it should refer to it as a
> modi character.*
>
> Then I tried to use tesstrain.sh as below
> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist
> MarathiCursiveT --lang mar --linedata_only --noextract_font_properties
> --langdata_dir ../tesstutorial/langdata --tessdata_dir
> ../tesstutorial/tesseract/tessdata --training_text
> ../tesstutorial/langdata/mar/mar.modi.training_text --output_dir
> ../tesstutorial/moditrain
>
> I got (by running make) MarathiCursiveT truetype Unicode modi font from
> the link https://github.com/MihailJP/MarathiCursive, mentioned in
> response to my query.
> That file I kept at /usr/share/fonts/truetype/MarathiCursiveT
>
> I created mar.modi.training_text  by copying content of  marathi training
> data text file in Aksharmukh app and taking output text in modi.
>
> *for tesstrain.sh I am getting error Could not find font named
> 'MarathiCursiveT. Pango suggested font 'MarthiCursiveT Medium'*
>
> Please advise for both the queries.Thanks in advance
>
> On Monday, January 27, 2020 at 3:22:17 AM UTC-5, shree wrote:
>>
>> For LSTM training punc, numbers, wordlist are NOT required. You can add
>> them if you like. Unicharset is generated from the training text.
>>
>> Are you planning to train from text or images?
>>
>> On Mon, Jan 27, 2020 at 2:19 AM 'Nilambari Joshi' via tesseract-ocr <
>> tesser...@googlegroups.com> wrote:
>>
>>> Thanks for your response. I will work as suggested. Please also clarify
>>> whether I need to create separate language directory for Modi similar to
>>> Marathi with all files like number, punc wordlist included and a separate
>>> unicharset file as well?
>>> Thanks in advance.
>>>
>>> On Sunday, January 26, 2020 at 12:26:51 PM UTC-5, shree wrote:
>>>>
>>>> Thanks for the link to Modi Unicode font.
>>>>
>>>> I would convert the Marathi training text to Modi script (use
>>>> Aksharamukha) and then train using the unicode font.
>>>>
>>>> On Sun, Jan 26, 2020 at 10:28 PM Patrick CHEW <patri...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Jan 26, 2020, at 08:16, Shree Devi Kumar <shree...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Is there a Unicode font for modi script?
>>>>>
>>>>>
>>>>> https://github.com/MihailJP/MarathiCursive
>>>>>
>>>>> On Sun, Jan 26, 2020, 21:22 'Nilambari Joshi' via tesseract-ocr <
>>>>> tesser...@googlegroups.com> wrote:
>>>>>
>>>>>> Hi... I want to create Modi script (Marathi language) traineddata in
>>>>>> tesseract for OCR. Can somebody guide what steps should I follow.
>>>>>> I referred to
>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>>>> but stuckup at a stage of creating box files.
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesser...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWX9WmC%3DXbVCRAM9qJd2UB65_QafyimqOg3X7GoVbbqfQ%40mail.gmail.com.

Reply via email to