If you increase the iterations then the plus type of training will not give
good result, i.e. the other letters will lose accuracy.

You can try to reduce the training text size while still keeping all the
characters that you need as part of the training text,

On Tue, Jun 18, 2019 at 2:24 AM Jingjing Lin <joejoeu...@gmail.com> wrote:

> I was only using two different fonts and It only achieved lowest error
> rate of 11.271 after the training, does this mean I really need to increase
> the iterations?
>
> 在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道:
>>
>> How big was your training text? How many iterations? Did the fonts you
>> use for training support the plus minus sign?
>>
>> You can run training with -- debug-level of -1 so that you can see
>> whether the plus minus is being picked for training in the console messages.
>>
>> On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
>>
>>> Thanks. It works. The new character I added was there.
>>>
>>> Do you have any idea why after fine tuning tesseract still couldn't
>>> recognize the new character I added? When I tried to add '±' to eng it
>>> works, but when I tried to add '±' to chi_sim, it couldn't work (explained
>>> below). Is there anything we need to pay attention to when fine tuning
>>> other langs rather than eng?
>>>
>>> I used
>>>
>>> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
>>>   --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \
>>>   --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 
>>> 2>&1 |
>>>   grep ±
>>>
>>> to check and ± only shows up in Truth but not in OCR
>>>
>>>
>>> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:
>>>>
>>>> combine_tessdata -u new.traineddata new.
>>>>
>>>> will unpack the traineddata file. check new.lstm-unicharset in it
>>>>
>>>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:
>>>>>
>>>>> I tried to fine tune the model and add a new character via training,
>>>>> but it seems it still couldn't recognize this new character using the new
>>>>> traineddata generated. To debug I want to check whether this new character
>>>>> is in the .unicharset in the new traineddata generated. Is there a way to
>>>>> do this?
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVE5eVX9ZKRVqFb8RVyAY5ZcxVwTeosrk1-kA4CuitfeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to