[tesseract-ocr] Re: how to check .unicharset in a .traineddata file

Jingjing Lin Mon, 17 Jun 2019 10:59:09 -0700

Thanks. It works. The new character I added was there.

Do you have any idea why after fine tuning tesseract still couldn't 
recognize the new character I added? When I tried to add '±' to eng it 
works, but when I tried to add '±' to chi_sim, it couldn't work (explained 
below). Is there anything we need to pay attention to when fine tuning 
other langs rather than eng?


I used 

lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
  --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \
  --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 |
  grep ±

to check and ± only shows up in Truth but not in OCR


在 2019年6月17日星期一 UTC-4上午11:31:24，shree写道：
>
> combine_tessdata -u new.traineddata new.
>
> will unpack the traineddata file. check new.lstm-unicharset in it
>
> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:
>>
>> I tried to fine tune the model and add a new character via training, but 
>> it seems it still couldn't recognize this new character using the new 
>> traineddata generated. To debug I want to check whether this new character 
>> is in the .unicharset in the new traineddata generated. Is there a way to 
>> do this?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: how to check .unicharset in a .traineddata file

Reply via email to