Could you also please advise for training experience I am training Vietnamese for only Time New Romans at this time.
The best traineddata is good, but it is big (for all fonts) and take quite a long time to process I plan to train from scratch, *...* *--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log* After 5000 iterations, *Error rate = 76.676 *it is so high What should I do next? It is any improvements if I rerun the above training for second/third time (with same data in *--train_listfile ~*). As I thought, each time the traineddata is updated. Is it a way to exact traineddata from best_traineddata for some selected fonts? Thanks, TuPM On Friday, August 7, 2020 at 9:30:33 AM UTC+7 minh...@gmail.com wrote: > Many thanks Shree, > > As you suggest, I remove the the path, now it works now > > by the way, my tesseract and lstm version: > > tesseract 5.0.0-alpha-773-gd33ed l > eptonica-1.78.0 > > ~ % lstmtraining -v > 5.0.0-alpha-773-gd33ed > On Friday, August 7, 2020 at 8:43:02 AM UTC+7 shree wrote: > >> If you have tesseract and all training tools installed, you should be >> able to use >> tesseract >> lstmtraining >> etc without giving the path. >> >> What's the output of >> >> which tesseract >> tesseract -v >> which lstmtraining >> lstmtraining -v >> >> >> >> On Fri, Aug 7, 2020, 01:13 minh...@gmail.com <minh...@gmail.com> wrote: >> >>> Sorry that I forgot to note: >>> >>> I use Mac OS 10.15.6 Catalina >>> >>> The tessseract version: tesseract 5.0.0-alpha-773-gd33ed >>> >>> Also, tesseract is installed via MacPorts, since installation via brew >>> has a lot of errors. >>> >>> Thanks, >>> On Friday, August 7, 2020 at 2:40:06 AM UTC+7 minh...@gmail.com wrote: >>> >>>> Dear friends, >>>> >>>> I have tried to run tesseract followed the guide in: >>>> https://github.com/tesseract-ocr/tesseract/issues/1453 >>>> >>>> Until the step 10: >>>> >>>> SCROLLVIEW_PATH=~/tesseract/java \ >>>> ~/tesseract/src/training/lstmtraining \ >>>> --debug_interval 100 \ >>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' >>>> \ >>>> --model_output ~/tesstutorial/engoutput/base \ >>>> --learning_rate 20e-4 \ >>>> --debug_interval -1 \ >>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>> >>>> then no thing happen, in the basetrain.log: >>>> *zsh: no such file or directory: >>>> /Users/minhtupham/tesseract/src/training/lstmtraining* >>>> >>>> is there missing lstmtraining file? >>>> I check in the training folder, there is a file name "lstmtraining.cpp" >>>> >>>> Please help me what I have to do? >>>> >>>> Many thanks, >>>> >>>> TuPM >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5c4f1657-252f-4f5e-be85-b55b78c21bf3n%40googlegroups.com.