increase the number of ± to about 100 On Tue, Jun 18, 2019 at 7:39 PM Jingjing Lin <joejoeu...@gmail.com> wrote:
> Sorry to bother you again and again. > I reduced the training text to about 450 lines, with like 30 ± in it. I > used two fonts and iteration of 1000. But it looks like ± is still not > picked up by the BEST OCR TEXT at all, it always recognizes ± as something > else. What is happening here? Should I increase the number of ±? Or do I > need to increase the number of fonts? I'm trying increasing iterations. > > 在 2019年6月18日星期二 UTC-4上午12:28:25,shree写道: >> >> If you increase the iterations then the plus type of training will not >> give good result, i.e. the other letters will lose accuracy. >> >> You can try to reduce the training text size while still keeping all the >> characters that you need as part of the training text, >> >> On Tue, Jun 18, 2019 at 2:24 AM Jingjing Lin <joejo...@gmail.com> wrote: >> >>> I was only using two different fonts and It only achieved lowest error >>> rate of 11.271 after the training, does this mean I really need to increase >>> the iterations? >>> >>> 在 2019年6月17日星期一 UTC-4下午2:16:31,shree写道: >>>> >>>> How big was your training text? How many iterations? Did the fonts you >>>> use for training support the plus minus sign? >>>> >>>> You can run training with -- debug-level of -1 so that you can see >>>> whether the plus minus is being picked for training in the console >>>> messages. >>>> >>>> On Mon, 17 Jun 2019, 23:29 Jingjing Lin, <joejo...@gmail.com> wrote: >>>> >>>>> Thanks. It works. The new character I added was there. >>>>> >>>>> Do you have any idea why after fine tuning tesseract still couldn't >>>>> recognize the new character I added? When I tried to add '±' to eng it >>>>> works, but when I tried to add '±' to chi_sim, it couldn't work (explained >>>>> below). Is there anything we need to pay attention to when fine tuning >>>>> other langs rather than eng? >>>>> >>>>> I used >>>>> >>>>> lstmeval --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ >>>>> --traineddata ~/tesstutorial/trainplusminus/chi_sim/chi_sim.traineddata >>>>> \ >>>>> --eval_listfile ~/tesstutorial/evalplusminus/chi_sim.training_files.txt >>>>> 2>&1 | >>>>> grep ± >>>>> >>>>> to check and ± only shows up in Truth but not in OCR >>>>> >>>>> >>>>> 在 2019年6月17日星期一 UTC-4上午11:31:24,shree写道: >>>>>> >>>>>> combine_tessdata -u new.traineddata new. >>>>>> >>>>>> will unpack the traineddata file. check new.lstm-unicharset in it >>>>>> >>>>>> On Monday, June 17, 2019 at 8:20:24 PM UTC+5:30, Jingjing Lin wrote: >>>>>>> >>>>>>> I tried to fine tune the model and add a new character via training, >>>>>>> but it seems it still couldn't recognize this new character using the >>>>>>> new >>>>>>> traineddata generated. To debug I want to check whether this new >>>>>>> character >>>>>>> is in the .unicharset in the new traineddata generated. Is there a way >>>>>>> to >>>>>>> do this? >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesser...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d251e677-5f9d-4f8f-b41a-aa015538ca47%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/692ad4d1-ff8e-4a67-a582-645a3fa5b941%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6d299e90-fc12-4a52-989f-5b787db5f1f7%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrFmPvprJTg3GjFWPdoEsNDWuOpWPW929kz6COuoO_jw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.