Hi Shree,

I am actually learning about create a new language traineddata for new 
languages. I would also like to contribute for tesseract. 

For this I am learning this. I have followed all your post as well as you 
projects on github. (Wanted to thank you for helping and contributing so 
many things online :))

I have already tried fine-tuning English language. Is there any information 
about why we need these files (devenagari.unicharset, Latin.unicharset and 
radical-stroke.txt) ? and do we need to use these files for new language 
like Chattisgarhi or any other language which is not available for 
tesseract?? 

Any help will be appreciated.

On Wednesday, 8 April 2020 21:58:37 UTC+5:30, shree wrote:
>
> Why do you want to fine-tune eng to get to hindi traineddata?
>
> You can fine-tune hin.traineddata or script/Devanagari.traineddata.
>
> On Wed, Apr 8, 2020, 21:00 Piyush Chandra <[email protected] 
> <javascript:>> wrote:
>
>> When I downloaded the devenagari.unicharset, Latin.unicharset and 
>> radical-stroke.txt
>> , it worked. What are these files and why we need this? Do we need to use 
>> these every time we work for new language or we need to create our own???
>>
>>
>> On Wednesday, 8 April 2020 20:42:44 UTC+5:30, Piyush Chandra wrote:
>>>
>>> Hi,
>>>
>>> I am trying to create a hindi traineddata from scratch using 
>>> eng.traineddata.
>>>
>>> I used some png and txt files to create box file using lstmbox and 
>>> edited those box files to correct the words.
>>>
>>> Then, I used lstm.train to create lstm files and created unicharset file 
>>> from the box files using unicharset_extractor.
>>>
>>> But now, when i use combine_lang_model to get starter traineddata file I 
>>> am getting error. Please help.
>>>
>>> ~/hindiFiles/hindi$ /usr/local/bin/combine_lang_model --input_unicharset 
>>> ./langdata/hin/hin.unicharset --script_dir ./langdata --words 
>>> ./langdata/hin.wordlist --numbers ./langdata/hin.numbers --puncs 
>>> ./langdata/hin.punc --output_dir /home/piyush/hindiFiles/hindi/langdata/ 
>>> --lang hin
>>> Loaded unicharset of size 39 from file ./langdata/hin/hin.unicharset
>>> Setting unichar properties
>>> Setting script properties
>>> Failed to load script unicharset from:./langdata/Latin.unicharset
>>> Failed to load script unicharset from:./langdata/Devanagari.unicharset
>>> Warning: properties incomplete for index 3 = मे
>>> Warning: properties incomplete for index 4 = रा
>>> Warning: properties incomplete for index 5 = ना
>>> Warning: properties incomplete for index 6 = म
>>> Warning: properties incomplete for index 7 = पी
>>> Warning: properties incomplete for index 8 = यू
>>> Warning: properties incomplete for index 9 = ष
>>> Warning: properties incomplete for index 10 = है
>>> Warning: properties incomplete for index 11 = ।
>>> Warning: properties incomplete for index 12 = हाँ
>>> Warning: properties incomplete for index 13 = ,
>>> Warning: properties incomplete for index 14 = मु
>>> Warning: properties incomplete for index 15 = झे
>>> Warning: properties incomplete for index 16 = भू
>>> Warning: properties incomplete for index 17 = ख
>>> Warning: properties incomplete for index 18 = ल
>>> Warning: properties incomplete for index 19 = गी
>>> Warning: properties incomplete for index 20 = तु
>>> Warning: properties incomplete for index 21 = म्‌
>>> Warning: properties incomplete for index 22 = हा
>>> Warning: properties incomplete for index 23 = क्‌
>>> Warning: properties incomplete for index 24 = या
>>> Warning: properties incomplete for index 25 = कै
>>> Warning: properties incomplete for index 26 = से
>>> Warning: properties incomplete for index 27 = हो
>>> Warning: properties incomplete for index 28 = ?
>>> Warning: properties incomplete for index 29 = क
>>> Warning: properties incomplete for index 30 = ब
>>> Warning: properties incomplete for index 31 = त
>>> Warning: properties incomplete for index 32 = आ
>>> Warning: properties incomplete for index 33 = ओ
>>> Warning: properties incomplete for index 34 = गे
>>> Warning: properties incomplete for index 35 = नीं
>>> Warning: properties incomplete for index 36 = द
>>> Warning: properties incomplete for index 37 = र
>>> Warning: properties incomplete for index 38 = ही
>>> Config file is optional, continuing...
>>> Failed to read data from: ./langdata/hin/hin.config
>>> Failed to read data from: ./langdata/radical-stroke.txt
>>> Error reading radical code table ./langdata/radical-stroke.txt
>>>
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/77cf0099-a40e-4186-b76c-b844832e2240%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/aadfb8a5-f3b7-4ab1-93c1-d0381d6ab3f3%40googlegroups.com.

Reply via email to