okay.Thank you very much. But does 36000 iterations overfit will happen? Shree Devi Kumar <[email protected]> 于2019年3月25日周一 下午11:43写道:
> 36000 iterations, error rate 0.1 > > OCR output attached > > > On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]> > wrote: > >> Try replacing a layer - you may need larger training_text and more >> iterations >> >> lstmtraining --model_output >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer \ >> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >> --append_index 5 --net_spec '[Lfx192 O1c1]' \ >> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ >> --max_iterations 30000 >> >> On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote: >> >>> Hello,everyone: >>> I have focus the training eng + chi_sim for several days,but one >>> urgent issue confused me. I have ask the questions before,but do not get >>> good reply,so I ask the questions again. Sorry for disturbing you. >>> >>> My steps is as follows: >>> >>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text >>> ../training_data/chi_sim_tuned.txt \ >>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim >>> --linedata_only --noextract_font_properties --exposures "0" \ >>> --workspace_dir ./share/workspace/tmp \ >>> --save_box_tiff \ >>> --fontlist "NSimSun" \ >>> "Times New Roman" \ >>> "Arial Unicode MS" \ >>> "SimSun" \ >>> "Merchant Copy" \ >>> "Merchant Copy Doublesize" \ >>> "Noto Sans CJK SC" \ >>> "Noto Sans Mono CJK SC" \ >>> --output_dir ~/tesstutorial/chi_sim_train \ >>> --overwrite >>> >>> >>> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim >>> >>> >>> >>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata >>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm >>> >>> >>> lstmtraining --model_output >>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \ >>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >>> --old_traineddata ../tessdata_best/chi_sim.traineddata \ >>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt >>> \ >>> --max_iterations 3000 >>> >>> lstmtraining --stop_training --continue_from >>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \ >>> --traineddata >>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata >>> >>> the train_text file is in the attachfile. >>> >>> >>> What confused me is that: the result contains some characters that do >>> not in the train_text file.(only chi_sim character have the problem,eng >>> is ok)。 >>> >>> Can anyone help me?Thanks a lot. >>> I also upload image and result file. Thanks in advance. >>> >>> Thank you. >>> >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE23RiKZmSWZejyzEXrfrGokL0zcbVNiEH7FCPjtv3s%3DtpQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

