Thanks for the detailed answer, I am giving it a shot and hoping for getting some better results :)
Thanks for all your help and support Best Regards On Friday, June 29, 2018 at 1:01:08 PM UTC+1, Lorenzo Blz wrote: > > > > Hi, > I'm trying to do fine tuning of an existing model using line images and > text labels. I'm running this version: > > tesseract 4.0.0-beta.3-56-g5fda > leptonica-1.76.0 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : > libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0 > Found AVX2 > Found AVX > Found SSE > > > > I used OCR-D to generate lstmf files for the demo data. > > If I run the make command it works fine. > > make training MODEL_NAME=prova > > Now I isolated this command from the build: > > lstmtraining \ > --traineddata data/prova/prova.traineddata \ > --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head > -n1 data/unicharset`]" \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 > > and it works fine. > > Now I'm trying to modify it to fine tune the existing eng model. I made a > few attempts, all ending into different errors (see the attached file for > full output). > > I used: > > combine_tessdata -e /usr/local/share/tessdata/eng.traineddata > extracted/eng.lstm > > to extract the eng.lstm model. > > This seems to works but I'm not sure it is the correct. > > lstmtraining \ > --continue_from extracted/eng.lstm \ > --traineddata data/prova/prova.traineddata \ > --old_traineddata extracted/eng.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 > > (extracted/eng.traineddata is just a copy of eng.traineddata) > > > The training resume exactly with the RMS of prova_checkpoint (6%) so it > looks like it is training from that checkpoint, not the eng.lstm. > > Is this correct? What should I change? > > I'm following this guide: > > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters > > > I think continue_from and traineddata should refer to the eng model and > old_traineddata should point to prova.traineddata, but if I do that I get a > segmentation fault: > > [...] > !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 > !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 > Segmentation fault > > What am I missing? > > > Thanks, bye > > Lorenzo > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/24b2fef7-8cbf-47e4-9b8f-73cf7ee93390%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.