36000 iterations, error rate 0.1

OCR output attached


On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]>
wrote:

> Try replacing a layer - you may need larger training_text and more
> iterations
>
> lstmtraining --model_output
> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer  \
> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
> --append_index 5 --net_spec '[Lfx192 O1c1]' \
> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
> --max_iterations 30000
>
> On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote:
>
>> Hello,everyone:
>>   I have focus the training eng + chi_sim for several days,but one
>> urgent  issue confused me. I have ask the questions before,but do not get
>> good reply,so I ask the questions again.   Sorry for disturbing you.
>>
>> My steps is as follows:
>>
>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text
>> ../training_data/chi_sim_tuned.txt   \
>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim
>> --linedata_only --noextract_font_properties  --exposures "0" \
>> --workspace_dir ./share/workspace/tmp \
>> --save_box_tiff \
>>  --fontlist  "NSimSun" \
>>         "Times New Roman" \
>>        "Arial Unicode MS" \
>>        "SimSun" \
>>       "Merchant Copy" \
>>       "Merchant Copy Doublesize" \
>>        "Noto Sans CJK SC" \
>> "Noto Sans Mono CJK SC" \
>> --output_dir ~/tesstutorial/chi_sim_train \
>> --overwrite
>>
>>
>> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
>>
>>
>>
>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
>>
>>
>> lstmtraining --model_output
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>> --old_traineddata ../tessdata_best/chi_sim.traineddata \
>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
>> --max_iterations 3000
>>
>> lstmtraining --stop_training --continue_from
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint  \
>>            --traineddata
>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
>>
>> the train_text file is in the attachfile.
>>
>>
>> What confused me is that: the result contains some characters that do
>> not in the train_text file.(only chi_sim character have the problem,eng
>> is ok)。
>>
>> Can anyone help me?Thanks a lot.
>> I also upload image and result file. Thanks in advance.
>>
>> Thank you.
>>
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
4 ~

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

261 | Q345L90X8     5587 | 2 | 61.16 | 122.32 | 局部合角 切角
202 | Q345L90A8      5587 | 2    61.16 | 122.32 | 局部合角 切角
203 | Q345L80X7     5439 | 2    46.37 | 92.74 | 局部开角 切角
204 | Q345L80X7      5439 | 2    46.37 | 92.74 | 局部开角 切角
205 | L45X4                      1698 | 4            4.65 | 18.60
206 | L40X3       1327 | 4    2.46| 9.84
207 | L40X4                     1519 | 4           3.68 | 414.72
208 | L40X3                       902 | 4            1.67| 6.68
209 | L40X4                      1325 | 4            3.21| 1412.84
210 | L40X3                       476 | 4           0.88| 3.52
211 | Q345-10X360     572 | 2    14.55 | 29.10 | 卷边60mm
212 | Q345-10X360     572 | 2    14.55 | 29.10 | 卷边60mm
213 | L45X4         1615 | 2     4.42 | 8.84
214 | L45AX4         1615 | 2     4.42 | 8.84 | 切角
215 | L40X3                     2269 | 2           4.20| 8.40
216 | L40X3         2269 | 2     4.20| 8.40 | 切角
217 | L40X3                    2034 | 2           3.77| -7524
218 | L40X3        2034 | 2     3.77| 7.54 | 切角
219 | L408g4                1673 | 2        4.05| 8.10
220 | L40X4 .       1673 | 2     4.05 | 8.10 | 切角
221 | Q345L125X8     1081 | 2    16.76 | 33.52 | 切角
222 | Q345L125A8     1039 | 2    16.11 | 32.22 | 切角
223 | -6X176       286 | 4    2.13| 8.52
224 | Q345-10X187            444 | 4           6.52 | 26.08
225 | L45AX4                      2342 | 2            6.41| 12.82
226 | L45A4         2342 | 2     6.41| 12.82 | 切角
227 | L40X3                  2261 | 2          4.19| 8.38
228 | L40X3         2261 | 2     4.19| 8.38 | 切角
 229 | L40X3 2061 | 2    3.82 | 7.64
230 | L40X3         2061 | 2     3.82 | 7.64 | 切角
 231 | -10X50       50 | 2   0.18| 60.36
.232 | -16X50       520 | 2    0.28| 60.56
.233 | -8X50        100 | 2    0.28| ec

 

 

 

 

 

 

 

 

 

 

 


Reply via email to