The training text was only about 2200 lines (200kB) and I used iteration of 3600. The fonts I used support ±.
What do you mean by 'whether ± is being picked for training'? When I set --debug_interval -1 I found in every iteration it only outputs one line, does that mean in every iteration only one line is being used for training?? 在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道: > > How big was your training text? How many iterations? Did the fonts you use > for training support the plus minus sign? > > You can run training with -- debug-level of -1 so that you can see whether > the plus minus is being picked for training in the console messages. > > On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com <javascript:>> > wrote: > >> Thanks. It works. The new character I added was there. >> >> Do you have any idea why after fine tuning tesseract still couldn't >> recognize the new character I added? When I tried to add '±' to eng it >> works, but when I tried to add '±' to chi_sim, it couldn't work (explained >> below). Is there anything we need to pay attention to when fine tuning >> other langs rather than eng? >> >> I used >> >> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ >> --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \ >> --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt >> 2>&1 | >> grep ± >> >> to check and ± only shows up in Truth but not in OCR >> >> >> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道: >>> >>> combine_tessdata -u new.traineddata new. >>> >>> will unpack the traineddata file. check new.lstm-unicharset in it >>> >>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote: >>>> >>>> I tried to fine tune the model and add a new character via training, >>>> but it seems it still couldn't recognize this new character using the new >>>> traineddata generated. To debug I want to check whether this new character >>>> is in the .unicharset in the new traineddata generated. Is there a way to >>>> do this? >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f408c974-aa0b-4df9-a364-d1f0ca2a8a1c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.