I was trying to do with image. I got one image online with all modi script 
characters and tried to create Box file for that image. 
In the box file I can see that it is considering each character as English 
character. 
*My question is how to make it realise that it should refer to it as a modi 
character.*

Then I tried to use tesstrain.sh as below
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist 
MarathiCursiveT --lang mar --linedata_only --noextract_font_properties 
--langdata_dir ../tesstutorial/langdata --tessdata_dir 
../tesstutorial/tesseract/tessdata --training_text 
../tesstutorial/langdata/mar/mar.modi.training_text --output_dir 
../tesstutorial/moditrain

I got (by running make) MarathiCursiveT truetype Unicode modi font from the 
link https://github.com/MihailJP/MarathiCursive, mentioned in response to 
my query.
That file I kept at /usr/share/fonts/truetype/MarathiCursiveT 

I created mar.modi.training_text  by copying content of  marathi training 
data text file in Aksharmukh app and taking output text in modi.

*for tesstrain.sh I am getting error Could not find font named 
'MarathiCursiveT. Pango suggested font 'MarthiCursiveT Medium'*

Please advise for both the queries.Thanks in advance

On Monday, January 27, 2020 at 3:22:17 AM UTC-5, shree wrote:
>
> For LSTM training punc, numbers, wordlist are NOT required. You can add 
> them if you like. Unicharset is generated from the training text.
>
> Are you planning to train from text or images?
>
> On Mon, Jan 27, 2020 at 2:19 AM 'Nilambari Joshi' via tesseract-ocr <
> tesser...@googlegroups.com <javascript:>> wrote:
>
>> Thanks for your response. I will work as suggested. Please also clarify 
>> whether I need to create separate language directory for Modi similar to 
>> Marathi with all files like number, punc wordlist included and a separate 
>> unicharset file as well?  
>> Thanks in advance.
>>
>> On Sunday, January 26, 2020 at 12:26:51 PM UTC-5, shree wrote:
>>>
>>> Thanks for the link to Modi Unicode font.
>>>
>>> I would convert the Marathi training text to Modi script (use 
>>> Aksharamukha) and then train using the unicode font.
>>>
>>> On Sun, Jan 26, 2020 at 10:28 PM Patrick CHEW <patri...@gmail.com> 
>>> wrote:
>>>
>>>>
>>>> On Jan 26, 2020, at 08:16, Shree Devi Kumar <shree...@gmail.com> wrote:
>>>>
>>>> Is there a Unicode font for modi script?
>>>>
>>>>
>>>> https://github.com/MihailJP/MarathiCursive
>>>>
>>>> On Sun, Jan 26, 2020, 21:22 'Nilambari Joshi' via tesseract-ocr <
>>>> tesser...@googlegroups.com> wrote:
>>>>
>>>>> Hi... I want to create Modi script (Marathi language) traineddata in 
>>>>> tesseract for OCR. Can somebody guide what steps should I follow.
>>>>> I referred to 
>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 
>>>>> but stuckup at a stage of creating box files.
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com.

Reply via email to