36000 iterations, error rate 0.1 OCR output attached
On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]> wrote: > Try replacing a layer - you may need larger training_text and more > iterations > > lstmtraining --model_output > ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer \ > --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ > --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ > --append_index 5 --net_spec '[Lfx192 O1c1]' \ > --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ > --max_iterations 30000 > > On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote: > >> Hello,everyone: >> I have focus the training eng + chi_sim for several days,but one >> urgent issue confused me. I have ask the questions before,but do not get >> good reply,so I ask the questions again. Sorry for disturbing you. >> >> My steps is as follows: >> >> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text >> ../training_data/chi_sim_tuned.txt \ >> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim >> --linedata_only --noextract_font_properties --exposures "0" \ >> --workspace_dir ./share/workspace/tmp \ >> --save_box_tiff \ >> --fontlist "NSimSun" \ >> "Times New Roman" \ >> "Arial Unicode MS" \ >> "SimSun" \ >> "Merchant Copy" \ >> "Merchant Copy Doublesize" \ >> "Noto Sans CJK SC" \ >> "Noto Sans Mono CJK SC" \ >> --output_dir ~/tesstutorial/chi_sim_train \ >> --overwrite >> >> >> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim >> >> >> >> combine_tessdata -e ../tessdata_best/chi_sim.traineddata >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm >> >> >> lstmtraining --model_output >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \ >> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >> --old_traineddata ../tessdata_best/chi_sim.traineddata \ >> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ >> --max_iterations 3000 >> >> lstmtraining --stop_training --continue_from >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \ >> --traineddata >> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata >> >> the train_text file is in the attachfile. >> >> >> What confused me is that: the result contains some characters that do >> not in the train_text file.(only chi_sim character have the problem,eng >> is ok)。 >> >> Can anyone help me?Thanks a lot. >> I also upload image and result file. Thanks in advance. >> >> Thank you. >> >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
4 ~ 261 | Q345L90X8 5587 | 2 | 61.16 | 122.32 | å±é¨åè§ åè§ 202 | Q345L90A8 5587 | 2 61.16 | 122.32 | å±é¨åè§ åè§ 203 | Q345L80X7 5439 | 2 46.37 | 92.74 | å±é¨å¼è§ åè§ 204 | Q345L80X7 5439 | 2 46.37 | 92.74 | å±é¨å¼è§ åè§ 205 | L45X4 1698 | 4 4.65 | 18.60 206 | L40X3 1327 | 4 2.46| 9.84 207 | L40X4 1519 | 4 3.68 | 414.72 208 | L40X3 902 | 4 1.67| 6.68 209 | L40X4 1325 | 4 3.21| 1412.84 210 | L40X3 476 | 4 0.88| 3.52 211 | Q345-10X360 572 | 2 14.55 | 29.10 | å·è¾¹60mm 212 | Q345-10X360 572 | 2 14.55 | 29.10 | å·è¾¹60mm 213 | L45X4 1615 | 2 4.42 | 8.84 214 | L45AX4 1615 | 2 4.42 | 8.84 | åè§ 215 | L40X3 2269 | 2 4.20| 8.40 216 | L40X3 2269 | 2 4.20| 8.40 | åè§ 217 | L40X3 2034 | 2 3.77| -7524 218 | L40X3 2034 | 2 3.77| 7.54 | åè§ 219 | L408g4 1673 | 2 4.05| 8.10 220 | L40X4 . 1673 | 2 4.05 | 8.10 | åè§ 221 | Q345L125X8 1081 | 2 16.76 | 33.52 | åè§ 222 | Q345L125A8 1039 | 2 16.11 | 32.22 | åè§ 223 | -6X176 286 | 4 2.13| 8.52 224 | Q345-10X187 444 | 4 6.52 | 26.08 225 | L45AX4 2342 | 2 6.41| 12.82 226 | L45A4 2342 | 2 6.41| 12.82 | åè§ 227 | L40X3 2261 | 2 4.19| 8.38 228 | L40X3 2261 | 2 4.19| 8.38 | åè§ 229 | L40X3 2061 | 2 3.82 | 7.64 230 | L40X3 2061 | 2 3.82 | 7.64 | åè§ 231 | -10X50 50 | 2 0.18| 60.36 .232 | -16X50 520 | 2 0.28| 60.56 .233 | -8X50 100 | 2 0.28| ec

