*MarthiCursiveT Medium* *Use the above as the font with tesstrain.sh* *How are you creating the box file for the image?*
On Tue, Jan 28, 2020, 21:56 'Nilambari Joshi' via tesseract-ocr < tesseract-ocr@googlegroups.com> wrote: > I was trying to do with image. I got one image online with all modi script > characters and tried to create Box file for that image. > In the box file I can see that it is considering each character as English > character. > *My question is how to make it realise that it should refer to it as a > modi character.* > > Then I tried to use tesstrain.sh as below > src/training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist > MarathiCursiveT --lang mar --linedata_only --noextract_font_properties > --langdata_dir ../tesstutorial/langdata --tessdata_dir > ../tesstutorial/tesseract/tessdata --training_text > ../tesstutorial/langdata/mar/mar.modi.training_text --output_dir > ../tesstutorial/moditrain > > I got (by running make) MarathiCursiveT truetype Unicode modi font from > the link https://github.com/MihailJP/MarathiCursive, mentioned in > response to my query. > That file I kept at /usr/share/fonts/truetype/MarathiCursiveT > > I created mar.modi.training_text by copying content of marathi training > data text file in Aksharmukh app and taking output text in modi. > > *for tesstrain.sh I am getting error Could not find font named > 'MarathiCursiveT. Pango suggested font 'MarthiCursiveT Medium'* > > Please advise for both the queries.Thanks in advance > > On Monday, January 27, 2020 at 3:22:17 AM UTC-5, shree wrote: >> >> For LSTM training punc, numbers, wordlist are NOT required. You can add >> them if you like. Unicharset is generated from the training text. >> >> Are you planning to train from text or images? >> >> On Mon, Jan 27, 2020 at 2:19 AM 'Nilambari Joshi' via tesseract-ocr < >> tesser...@googlegroups.com> wrote: >> >>> Thanks for your response. I will work as suggested. Please also clarify >>> whether I need to create separate language directory for Modi similar to >>> Marathi with all files like number, punc wordlist included and a separate >>> unicharset file as well? >>> Thanks in advance. >>> >>> On Sunday, January 26, 2020 at 12:26:51 PM UTC-5, shree wrote: >>>> >>>> Thanks for the link to Modi Unicode font. >>>> >>>> I would convert the Marathi training text to Modi script (use >>>> Aksharamukha) and then train using the unicode font. >>>> >>>> On Sun, Jan 26, 2020 at 10:28 PM Patrick CHEW <patri...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> On Jan 26, 2020, at 08:16, Shree Devi Kumar <shree...@gmail.com> >>>>> wrote: >>>>> >>>>> Is there a Unicode font for modi script? >>>>> >>>>> >>>>> https://github.com/MihailJP/MarathiCursive >>>>> >>>>> On Sun, Jan 26, 2020, 21:22 'Nilambari Joshi' via tesseract-ocr < >>>>> tesser...@googlegroups.com> wrote: >>>>> >>>>>> Hi... I want to create Modi script (Marathi language) traineddata in >>>>>> tesseract for OCR. Can somebody guide what steps should I follow. >>>>>> I referred to >>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>>>>> but stuckup at a stage of creating box files. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesser...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/EB77DC11-4EBA-498C-A8AE-E728C3F82A4D%40gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/3d481093-8efd-408c-abcc-758c6c72df32%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/b65c4a9d-ea7c-44af-956e-e9628ba05ee4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWX9WmC%3DXbVCRAM9qJd2UB65_QafyimqOg3X7GoVbbqfQ%40mail.gmail.com.