Thanks. It works. The new character I added was there. Do you have any idea why after fine tuning tesseract still couldn't recognize the new character I added? When I tried to add '±' to eng it works, but when I tried to add '±' to chi_sim, it couldn't work (explained below). Is there anything we need to pay attention to when fine tuning other langs rather than eng?
I used lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \ --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 | grep ± to check and ± only shows up in Truth but not in OCR 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道: > > combine_tessdata -u new.traineddata new. > > will unpack the traineddata file. check new.lstm-unicharset in it > > On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote: >> >> I tried to fine tune the model and add a new character via training, but >> it seems it still couldn't recognize this new character using the new >> traineddata generated. To debug I want to check whether this new character >> is in the .unicharset in the new traineddata generated. Is there a way to >> do this? >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.