Your script seems to look ok. --U $train_output_dir/eng/eng.unicharset \ # not sure if this is necessary; doesn't make a difference is NOT required
I will suggest that you remove files from an earlier run, before running the script. Take a look at $train_output_dir/eng directory and review the unicharset there to see whether your new characters are included in the unicharset. Take a look at the log file, specially in the initial portion to see whether it shows increase in number of characters. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Dec 12, 2017 at 9:24 AM, J Klein <jetm...@gmail.com> wrote: > > On Thursday, December 7, 2017 at 11:55:53 PM UTC-5, shree wrote: >> >> Please check the last section on >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >> > > Thank you for this tip. I'm getting farther than before. I thought > --trainedata was my final traineddata output file. > I now made the final eng.trainedata 'lstmtraining --stop_training ...." > as follows > > $tesstrain_dir/lstmtraining \ > --stop_training \ > --continue_from $train_output_dir/pluschars_checkpoint \ > --traineddata $train_output_dir/eng/eng.traineddata \ > --U $train_output_dir/eng/eng.unicharset \ # not sure if this is > necessary; doesn't make a difference > --model_output $final_trained_data_file > > And I get a $final_trained_data_file that I can use to replace > /usr/local/share/tessdata/eng.traineddata and it doesn't fail on init3() > any more. But it doesn't recognize any of the new chars either. > However, in running > > /usr/local/bin/tesseract-training/lstmeval \ > --model ./trained_plus_chars/pluschars_checkpoint \ > --traineddata ./trained_plus_chars/eng/eng.traineddata \ > --eval_listfile ./trained_plus_chars/eng.training_files.txt > > it DID recognize the new chars most of the time. So I think there may > still be something something wrong with the construction of the --model_output > $final_trained_data_file. > > My entire training sequence bash script is here: > *https://pastebin.com/gNLvXkiM > <https://pastebin.com/gNLvXkiM>* > > Can you tell if there is anything obviously wrong? > > > Thanks > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVp9_2bwOYdWsFnLsWusK_N9p3htvGJEd4X7UjmmTskNA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.