okay.Thank you very much.
But does 36000 iterations overfit will happen?

Shree Devi Kumar <[email protected]> 于2019年3月25日周一 下午11:43写道:

> 36000 iterations, error rate 0.1
>
> OCR output attached
>
>
> On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]>
> wrote:
>
>> Try replacing a layer - you may need larger training_text and more
>> iterations
>>
>> lstmtraining --model_output
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer  \
>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>> --append_index 5 --net_spec '[Lfx192 O1c1]' \
>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
>> --max_iterations 30000
>>
>> On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote:
>>
>>> Hello,everyone:
>>>   I have focus the training eng + chi_sim for several days,but one
>>> urgent  issue confused me. I have ask the questions before,but do not get
>>> good reply,so I ask the questions again.   Sorry for disturbing you.
>>>
>>> My steps is as follows:
>>>
>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text
>>> ../training_data/chi_sim_tuned.txt   \
>>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim
>>> --linedata_only --noextract_font_properties  --exposures "0" \
>>> --workspace_dir ./share/workspace/tmp \
>>> --save_box_tiff \
>>>  --fontlist  "NSimSun" \
>>>         "Times New Roman" \
>>>        "Arial Unicode MS" \
>>>        "SimSun" \
>>>       "Merchant Copy" \
>>>       "Merchant Copy Doublesize" \
>>>        "Noto Sans CJK SC" \
>>> "Noto Sans Mono CJK SC" \
>>> --output_dir ~/tesstutorial/chi_sim_train \
>>> --overwrite
>>>
>>>
>>> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
>>>
>>>
>>>
>>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata
>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
>>>
>>>
>>> lstmtraining --model_output
>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
>>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>>> --old_traineddata ../tessdata_best/chi_sim.traineddata \
>>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt
>>> \
>>> --max_iterations 3000
>>>
>>> lstmtraining --stop_training --continue_from
>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint  \
>>>            --traineddata
>>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
>>>
>>> the train_text file is in the attachfile.
>>>
>>>
>>> What confused me is that: the result contains some characters that do
>>> not in the train_text file.(only chi_sim character have the problem,eng
>>> is ok)。
>>>
>>> Can anyone help me?Thanks a lot.
>>> I also upload image and result file. Thanks in advance.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE23RiKZmSWZejyzEXrfrGokL0zcbVNiEH7FCPjtv3s%3DtpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to