and how many lines are the training_text is better , the total number of my
character is no more than 100.

易鑫 <[email protected]> 于2019年3月26日周二 上午9:50写道:

> okay.Thank you very much.
> But does 36000 iterations overfit will happen?
>
> Shree Devi Kumar <[email protected]> 于2019年3月25日周一 下午11:43写道:
>
>> 36000 iterations, error rate 0.1
>>
>> OCR output attached
>>
>>
>> On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]>
>> wrote:
>>
>>> Try replacing a layer - you may need larger training_text and more
>>> iterations
>>>
>>> lstmtraining --model_output
>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer  \
>>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>>> --append_index 5 --net_spec '[Lfx192 O1c1]' \
>>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt
>>> \
>>> --max_iterations 30000
>>>
>>> On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote:
>>>
>>>> Hello,everyone:
>>>>   I have focus the training eng + chi_sim for several days,but one
>>>> urgent  issue confused me. I have ask the questions before,but do not get
>>>> good reply,so I ask the questions again.   Sorry for disturbing you.
>>>>
>>>> My steps is as follows:
>>>>
>>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text
>>>> ../training_data/chi_sim_tuned.txt   \
>>>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim
>>>> --linedata_only --noextract_font_properties  --exposures "0" \
>>>> --workspace_dir ./share/workspace/tmp \
>>>> --save_box_tiff \
>>>>  --fontlist  "NSimSun" \
>>>>         "Times New Roman" \
>>>>        "Arial Unicode MS" \
>>>>        "SimSun" \
>>>>       "Merchant Copy" \
>>>>       "Merchant Copy Doublesize" \
>>>>        "Noto Sans CJK SC" \
>>>> "Noto Sans Mono CJK SC" \
>>>> --output_dir ~/tesstutorial/chi_sim_train \
>>>> --overwrite
>>>>
>>>>
>>>> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
>>>>
>>>>
>>>>
>>>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata
>>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
>>>>
>>>>
>>>> lstmtraining --model_output
>>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
>>>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>>>> --old_traineddata ../tessdata_best/chi_sim.traineddata \
>>>> --train_listfile
>>>> ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
>>>> --max_iterations 3000
>>>>
>>>> lstmtraining --stop_training --continue_from
>>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint  \
>>>>            --traineddata
>>>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
>>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
>>>>
>>>> the train_text file is in the attachfile.
>>>>
>>>>
>>>> What confused me is that: the result contains some characters that do
>>>> not in the train_text file.(only chi_sim character have the
>>>> problem,eng is ok)。
>>>>
>>>> Can anyone help me?Thanks a lot.
>>>> I also upload image and result file. Thanks in advance.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE2387Fy_FOZ7%2BbcF9eTW_A5npXK-kLcGVPzTbt0_7s5BQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to