increase the number of ± to about 100

On Tue, Jun 18, 2019 at 7:39 PM Jingjing Lin <joejoeu...@gmail.com> wrote:

> Sorry to bother you again and again.
> I reduced the training text to about 450 lines, with like 30 ± in it. I
> used two fonts and iteration of 1000. But it looks like ± is still not
> picked up by the BEST OCR TEXT at all, it always recognizes ± as something
> else. What is happening here? Should I increase the number of ±? Or do I
> need to increase the number of fonts? I'm trying increasing iterations.
>
> 在 2019年6月18日星期二 UTC-4上午12:28:25,shree写道:
>>
>> If you increase the iterations then the plus type of training will not
>> give good result, i.e. the other letters will lose accuracy.
>>
>> You can try to reduce the training text size while still keeping all the
>> characters that you need as part of the training text,
>>
>> On Tue, Jun 18, 2019 at 2:24 AM Jingjing Lin <joejo...@gmail.com> wrote:
>>
>>> I was only using two different fonts and It only achieved lowest error
>>> rate of 11.271 after the training, does this mean I really need to increase
>>> the iterations?
>>>
>>> 在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道:
>>>>
>>>> How big was your training text? How many iterations? Did the fonts you
>>>> use for training support the plus minus sign?
>>>>
>>>> You can run training with -- debug-level of -1 so that you can see
>>>> whether the plus minus is being picked for training in the console 
>>>> messages.
>>>>
>>>> On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
>>>>
>>>>> Thanks. It works. The new character I added was there.
>>>>>
>>>>> Do you have any idea why after fine tuning tesseract still couldn't
>>>>> recognize the new character I added? When I tried to add '±' to eng it
>>>>> works, but when I tried to add '±' to chi_sim, it couldn't work (explained
>>>>> below). Is there anything we need to pay attention to when fine tuning
>>>>> other langs rather than eng?
>>>>>
>>>>> I used
>>>>>
>>>>> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
>>>>>   --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata 
>>>>> \
>>>>>   --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 
>>>>> 2>&1 |
>>>>>   grep ±
>>>>>
>>>>> to check and ± only shows up in Truth but not in OCR
>>>>>
>>>>>
>>>>> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:
>>>>>>
>>>>>> combine_tessdata -u new.traineddata new.
>>>>>>
>>>>>> will unpack the traineddata file. check new.lstm-unicharset in it
>>>>>>
>>>>>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote:
>>>>>>>
>>>>>>> I tried to fine tune the model and add a new character via training,
>>>>>>> but it seems it still couldn't recognize this new character using the 
>>>>>>> new
>>>>>>> traineddata generated. To debug I want to check whether this new 
>>>>>>> character
>>>>>>> is in the .unicharset in the new traineddata generated. Is there a way 
>>>>>>> to
>>>>>>> do this?
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesser...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrFmPvprJTg3GjFWPdoEsNDWuOpWPW929kz6COuoO_jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to