[tesseract-ocr] Re: Fine tuning existing model

Raniem Thu, 06 Sep 2018 16:01:32 -0700

Thanks for the detailed answer, I am giving it a shot and hoping for 
getting some better results :)


Thanks for all your help and support

Best Regards

On Friday, June 29, 2018 at 1:01:08 PM UTC+1, Lorenzo Blz wrote:
>
> 
>
> Hi,
> I'm trying to do fine tuning of an existing model using line images and 
> text labels. I'm running this version:
>
> tesseract 4.0.0-beta.3-56-g5fda
>  leptonica-1.76.0
>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : 
> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found SSE
>
>
>
> I used OCR-D to generate lstmf files for the demo data.
>
> If I run the make command it works fine. 
>
> make training MODEL_NAME=prova
>
> Now I isolated this command from the build:
>
> lstmtraining \
>   --traineddata data/prova/prova.traineddata \
>   --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head 
> -n1 data/unicharset`]" \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
>
> and it works fine.
>
> Now I'm trying to modify it to fine tune the existing eng model. I made a 
> few attempts, all ending into different errors (see the attached file for 
> full output).
>
> I used:
>
> combine_tessdata -e /usr/local/share/tessdata/eng.traineddata 
> extracted/eng.lstm
>
> to extract the eng.lstm model. 
>
> This seems to works but I'm not sure it is the correct.
>
> lstmtraining \
>   --continue_from  extracted/eng.lstm \
>   --traineddata data/prova/prova.traineddata \
>   --old_traineddata extracted/eng.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
>
> (extracted/eng.traineddata is just a copy of eng.traineddata)
>
>
> The training resume exactly with the RMS of prova_checkpoint (6%) so it 
> looks like it is training from that checkpoint, not the eng.lstm.
>
> Is this correct? What should I change?
> 
> I'm following this guide:
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
>
> 
> I think continue_from and traineddata should refer to the eng model and 
> old_traineddata should point to prova.traineddata, but if I do that I get a 
> segmentation fault:
>
> [...]
> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
> Segmentation fault
>
> What am I missing?
>
>
> Thanks, bye
>
> Lorenzo
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24b2fef7-8cbf-47e4-9b8f-73cf7ee93390%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Fine tuning existing model

Reply via email to