Why do you want to fine-tune eng to get to hindi traineddata? You can fine-tune hin.traineddata or script/Devanagari.traineddata.
On Wed, Apr 8, 2020, 21:00 Piyush Chandra <[email protected]> wrote: > When I downloaded the devenagari.unicharset, Latin.unicharset and > radical-stroke.txt > , it worked. What are these files and why we need this? Do we need to use > these every time we work for new language or we need to create our own??? > > > On Wednesday, 8 April 2020 20:42:44 UTC+5:30, Piyush Chandra wrote: >> >> Hi, >> >> I am trying to create a hindi traineddata from scratch using >> eng.traineddata. >> >> I used some png and txt files to create box file using lstmbox and edited >> those box files to correct the words. >> >> Then, I used lstm.train to create lstm files and created unicharset file >> from the box files using unicharset_extractor. >> >> But now, when i use combine_lang_model to get starter traineddata file I >> am getting error. Please help. >> >> ~/hindiFiles/hindi$ /usr/local/bin/combine_lang_model --input_unicharset >> ./langdata/hin/hin.unicharset --script_dir ./langdata --words >> ./langdata/hin.wordlist --numbers ./langdata/hin.numbers --puncs >> ./langdata/hin.punc --output_dir /home/piyush/hindiFiles/hindi/langdata/ >> --lang hin >> Loaded unicharset of size 39 from file ./langdata/hin/hin.unicharset >> Setting unichar properties >> Setting script properties >> Failed to load script unicharset from:./langdata/Latin.unicharset >> Failed to load script unicharset from:./langdata/Devanagari.unicharset >> Warning: properties incomplete for index 3 = मे >> Warning: properties incomplete for index 4 = रा >> Warning: properties incomplete for index 5 = ना >> Warning: properties incomplete for index 6 = म >> Warning: properties incomplete for index 7 = पी >> Warning: properties incomplete for index 8 = यू >> Warning: properties incomplete for index 9 = ष >> Warning: properties incomplete for index 10 = है >> Warning: properties incomplete for index 11 = । >> Warning: properties incomplete for index 12 = हाँ >> Warning: properties incomplete for index 13 = , >> Warning: properties incomplete for index 14 = मु >> Warning: properties incomplete for index 15 = झे >> Warning: properties incomplete for index 16 = भू >> Warning: properties incomplete for index 17 = ख >> Warning: properties incomplete for index 18 = ल >> Warning: properties incomplete for index 19 = गी >> Warning: properties incomplete for index 20 = तु >> Warning: properties incomplete for index 21 = म् >> Warning: properties incomplete for index 22 = हा >> Warning: properties incomplete for index 23 = क् >> Warning: properties incomplete for index 24 = या >> Warning: properties incomplete for index 25 = कै >> Warning: properties incomplete for index 26 = से >> Warning: properties incomplete for index 27 = हो >> Warning: properties incomplete for index 28 = ? >> Warning: properties incomplete for index 29 = क >> Warning: properties incomplete for index 30 = ब >> Warning: properties incomplete for index 31 = त >> Warning: properties incomplete for index 32 = आ >> Warning: properties incomplete for index 33 = ओ >> Warning: properties incomplete for index 34 = गे >> Warning: properties incomplete for index 35 = नीं >> Warning: properties incomplete for index 36 = द >> Warning: properties incomplete for index 37 = र >> Warning: properties incomplete for index 38 = ही >> Config file is optional, continuing... >> Failed to read data from: ./langdata/hin/hin.config >> Failed to read data from: ./langdata/radical-stroke.txt >> Error reading radical code table ./langdata/radical-stroke.txt >> >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWJZhckXbxWoidt2QjywAv9aB09s1zqVSYL7Yzb9HkywQ%40mail.gmail.com.

