Thanks a lot!

在 2019年6月18日星期二 UTC-4下午2:21:18,shree写道:
>
> I will test tomorrow and let you know
>
> On Tue, 18 Jun 2019, 23:47 Jingjing Lin, <joejo...@gmail.com <javascript:>> 
> wrote:
>
>> It still couldn't work after I increased the number of ± to about 100. 
>> And the error rate after 2000 iterations is about 11. This is a pretty high 
>> error rate compare to what we have for adding a few characters to eng. With 
>> such high error rate, I would not be surprised that it could't recognize 
>> some special characters like ±. Is this it for chi_sim? Or can I increase 
>> iterations to make the error rate smaller? 
>> Thanks for your help.
>>
>> 在 2019年6月18日星期二 UTC-4上午10:32:37,shree写道:
>>>
>>>  increase the number of ± to about 100 
>>>
>>> On Tue, Jun 18, 2019 at 7:39 PM Jingjing Lin <joejo...@gmail.com> wrote:
>>>
>>>> Sorry to bother you again and again.
>>>> I reduced the training text to about 450 lines, with like 30 ± in it. I 
>>>> used two fonts and iteration of 1000. But it looks like ± is still not 
>>>> picked up by the BEST OCR TEXT at all, it always recognizes ± as something 
>>>> else. What is happening here? Should I increase the number of ±? Or do I 
>>>> need to increase the number of fonts? I'm trying increasing iterations.
>>>>
>>>> 在 2019年6月18日星期二 UTC-4上午12:28:25,shree写道:
>>>>>
>>>>> If you increase the iterations then the plus type of training will not 
>>>>> give good result, i.e. the other letters will lose accuracy.
>>>>>
>>>>> You can try to reduce the training text size while still keeping all 
>>>>> the characters that you need as part of the training text, 
>>>>>
>>>>> On Tue, Jun 18, 2019 at 2:24 AM Jingjing Lin <joejo...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> I was only using two different fonts and It only achieved lowest 
>>>>>> error rate of 11.271 after the training, does this mean I really need to 
>>>>>> increase the iterations?
>>>>>>
>>>>>> 在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道:
>>>>>>>
>>>>>>> How big was your training text? How many iterations? Did the fonts 
>>>>>>> you use for training support the plus minus sign? 
>>>>>>>
>>>>>>> You can run training with -- debug-level of -1 so that you can see 
>>>>>>> whether the plus minus is being picked for training in the console 
>>>>>>> messages.
>>>>>>>
>>>>>>> On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks. It works. The new character I added was there.
>>>>>>>>
>>>>>>>> Do you have any idea why after fine tuning tesseract still couldn't 
>>>>>>>> recognize the new character I added? When I tried to add '±' to eng it 
>>>>>>>> works, but when I tried to add '±' to chi_sim, it couldn't work 
>>>>>>>> (explained 
>>>>>>>> below). Is there anything we need to pay attention to when fine tuning 
>>>>>>>> other langs rather than eng?
>>>>>>>>
>>>>>>>> I used 
>>>>>>>>
>>>>>>>> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
>>>>>>>>   --traineddata 
>>>>>>>> ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata \
>>>>>>>>   --eval_listfile 
>>>>>>>> ~/tesstutorial/evalplusminus/chi_sim.training_files.txt 2>&1 |
>>>>>>>>   grep ±
>>>>>>>>
>>>>>>>> to check and ± only shows up in Truth but not in OCR
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道:
>>>>>>>>>
>>>>>>>>> combine_tessdata -u new.traineddata new.
>>>>>>>>>
>>>>>>>>> will unpack the traineddata file. check new.lstm-unicharset in it
>>>>>>>>>
>>>>>>>>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I tried to fine tune the model and add a new character via 
>>>>>>>>>> training, but it seems it still couldn't recognize this new 
>>>>>>>>>> character using 
>>>>>>>>>> the new traineddata generated. To debug I want to check whether this 
>>>>>>>>>> new 
>>>>>>>>>> character is in the .unicharset in the new traineddata generated. Is 
>>>>>>>>>> there 
>>>>>>>>>> a way to do this?
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesser...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d5d4c267-c6e4-41e6-b0ab-01391a1b666d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/d5d4c267-c6e4-41e6-b0ab-01391a1b666d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8ac9714d-0a85-48c6-bec8-495dab4c56d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to