Thanks Lorenzo. Your method makes all the magic I needed.
One other question please, I am attempting to fine tune only the last layer, so I have replaced the --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 data/unicharset`]" \ int the lstmtraining command with: --continue_from $(TESSDATA)/$(CONTINUE_FROM).lstm \ --append_index 5 --net_spec '[Lfx256 O1c69]' but I am getting this error : *int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 222* *Makefile:129: recipe for target 'data/checkpoints/eng_checkpoint' failed* *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)* can any one please advice on what I am doing wrong? P.S my unicharset contains 69 character. Regards On Friday, September 7, 2018 at 12:01:06 AM UTC+1, Raniem wrote: > > Thanks for the detailed answer, I am giving it a shot and hoping for > getting some better results :) > > Thanks for all your help and support > > Best Regards > > On Friday, June 29, 2018 at 1:01:08 PM UTC+1, Lorenzo Blz wrote: >> >> >> >> Hi, >> I'm trying to do fine tuning of an existing model using line images and >> text labels. I'm running this version: >> >> tesseract 4.0.0-beta.3-56-g5fda >> leptonica-1.76.0 >> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : >> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0 >> Found AVX2 >> Found AVX >> Found SSE >> >> >> >> I used OCR-D to generate lstmf files for the demo data. >> >> If I run the make command it works fine. >> >> make training MODEL_NAME=prova >> >> Now I isolated this command from the build: >> >> lstmtraining \ >> --traineddata data/prova/prova.traineddata \ >> --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head >> -n1 data/unicharset`]" \ >> --model_output data/checkpoints/prova \ >> --learning_rate 20e-4 \ >> --train_listfile data/list.train \ >> --eval_listfile data/list.eval \ >> --max_iterations 10000 >> >> and it works fine. >> >> Now I'm trying to modify it to fine tune the existing eng model. I made a >> few attempts, all ending into different errors (see the attached file for >> full output). >> >> I used: >> >> combine_tessdata -e /usr/local/share/tessdata/eng.traineddata >> extracted/eng.lstm >> >> to extract the eng.lstm model. >> >> This seems to works but I'm not sure it is the correct. >> >> lstmtraining \ >> --continue_from extracted/eng.lstm \ >> --traineddata data/prova/prova.traineddata \ >> --old_traineddata extracted/eng.traineddata \ >> --model_output data/checkpoints/prova \ >> --learning_rate 20e-4 \ >> --train_listfile data/list.train \ >> --eval_listfile data/list.eval \ >> --max_iterations 10000 >> >> (extracted/eng.traineddata is just a copy of eng.traineddata) >> >> >> The training resume exactly with the RMS of prova_checkpoint (6%) so it >> looks like it is training from that checkpoint, not the eng.lstm. >> >> Is this correct? What should I change? >> >> I'm following this guide: >> >> >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters >> >> >> I think continue_from and traineddata should refer to the eng model and >> old_traineddata should point to prova.traineddata, but if I do that I get a >> segmentation fault: >> >> [...] >> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 >> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 >> Segmentation fault >> >> What am I missing? >> >> >> Thanks, bye >> >> Lorenzo >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/aac121aa-4f22-4785-926d-a22b3985974a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.