How big was your training text? How many iterations? Did the fonts you use
for training support the plus minus sign?

You can run training with -- debug-level of -1 so that you can see whether
the plus minus is being picked for training in the console messages.

On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejoeu...@gmail.com> wrote:

> Thanks. It works. The new character I added was there.
>
> Do you have any idea why after fine tuning tesseract still couldn't
> recognize the new character I added? When I tried to add '±' to eng it
> works, but when I tried to add '±' to chi_sim, it couldn't work (explained
> below). Is there anything we need to pay attention to when fine tuning
> other langs rather than eng?
>
> I used
>
> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
>   --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \
>   --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 
> 2>&1 |
>   grep ±
>
> to check and ± only shows up in Truth but not in OCR
>
>
> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:
>>
>> combine_tessdata -u new.traineddata new.
>>
>> will unpack the traineddata file. check new.lstm-unicharset in it
>>
>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:
>>>
>>> I tried to fine tune the model and add a new character via training, but
>>> it seems it still couldn't recognize this new character using the new
>>> traineddata generated. To debug I want to check whether this new character
>>> is in the .unicharset in the new traineddata generated. Is there a way to
>>> do this?
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVjKKD%2B%3DPGNQB249yrndmQH_fo4P%2BtxHfvCbO-2hnH5_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to