​​

Hi,
I'm trying to do fine tuning of an existing model using line images and
text labels. I'm running this version:

tesseract 4.0.0-beta.3-56-g5fda
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff
4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE



I used OCR-D to generate lstmf files for the demo data.

If I run the make command it works fine.

make training MODEL_NAME=prova

Now I isolated this command from the build:

lstmtraining \
  --traineddata data/prova/prova.traineddata \
  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head
-n1 data/unicharset`]" \
  --model_output data/checkpoints/prova \
  --learning_rate 20e-4 \
  --train_listfile data/list.train \
  --eval_listfile data/list.eval \
  --max_iterations 10000

and it works fine.

Now I'm trying to modify it to fine tune the existing eng model. I made a
few attempts, all ending into different errors (see the attached file for
full output).

I used:

combine_tessdata -e /usr/local/share/tessdata/eng.traineddata
extracted/eng.lstm

to extract the eng.lstm model.

This seems to works but I'm not sure it is the correct.

lstmtraining \
  --continue_from  extracted/eng.lstm \
  --traineddata data/prova/prova.traineddata \
  --old_traineddata extracted/eng.traineddata \
  --model_output data/checkpoints/prova \
  --learning_rate 20e-4 \
  --train_listfile data/list.train \
  --eval_listfile data/list.eval \
  --max_iterations 10000

(extracted/eng.traineddata is just a copy of eng.traineddata)


The training resume exactly with the RMS of prova_checkpoint (6%) so it
looks like it is training from that checkpoint, not the eng.lstm.

Is this correct? What should I change?
​
I'm following this guide:

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters

​
I think continue_from and traineddata should refer to the eng model and
old_traineddata should point to prova.traineddata, but if I do that I get a
segmentation fault:

[...]
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault

What am I missing?


Thanks, bye

Lorenzo

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyOJN31PdWQumXPO3JjuAc1Yz2BZYpMd4ftzBHgZkEaxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \
>   --continue_from  extracted/eng.lstm \
>   --traineddata extracted/eng.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Code range changed from 60 to 111!
Must supply the old traineddata for code conversion!
Loaded file extracted/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from extracted/eng.lstm
Loaded 1/1 pages (1-1) of document 
data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0175_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/heine_reisebilder02_1827_0056_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0040_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/spielhagen_problematische02_1861_0151_022.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0127_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0304_002.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/bismarck_erinnerungen02_1898_0150_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0142_005.lstmf
Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault

aaa@host .../DATA/DeepLearning/ocrd-train $ 
aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 1.1
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \
>   --continue_from  data/checkpoints/prova_checkpoint \
>   --traineddata extracted/eng.traineddata \
>   --old_traineddata data/prova/prova.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Code range changed from 60 to 111!
Must supply the old traineddata for code conversion!
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Code range changed from 60 to 111!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx256:256, 361472
  Fc111:111, 28527
Total weights = 532431
Previous null char=59 mapped to 110
Continuing from data/checkpoints/prova_checkpoint
Loaded 1/1 pages (1-1) of document 
data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0040_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/heine_reisebilder02_1827_0056_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/spielhagen_problematische02_1861_0151_022.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0127_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0175_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0304_002.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/bismarck_erinnerungen02_1898_0150_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0142_005.lstmf
Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/eichendorff_taugenichts_1826_0036_001.lstmf
Segmentation fault

aaa@host .../DATA/DeepLearning/ocrd-train $ 
aaa@host .../DATA/DeepLearning/ocrd-train $ # Prova 2, da eng.traineddata
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \
>   --continue_from  extracted/eng.lstm \
>   --traineddata extracted/eng.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Code range changed from 60 to 111!
Must supply the old traineddata for code conversion!
Loaded file extracted/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from extracted/eng.lstm
Loaded 1/1 pages (1-1) of document 
data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/heine_reisebilder02_1827_0056_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/spielhagen_problematische02_1861_0151_022.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0127_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0040_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0175_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/bismarck_erinnerungen02_1898_0150_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0304_002.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0142_005.lstmf
Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault

aaa@host .../DATA/DeepLearning/ocrd-train $ 
aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 3, with old_traineddata 
(this works but uses the prova checkpoint)
aaa@host .../DATA/DeepLearning/ocrd-train $ 
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \
>   --continue_from  extracted/eng.lstm \
>   --traineddata data/prova/prova.traineddata \
>   --old_traineddata extracted/eng.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints/prova_checkpoint
Loaded 1/1 pages (1-1) of document 
data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0040_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/heine_reisebilder02_1827_0056_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/spielhagen_problematische02_1861_0151_022.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0127_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0175_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/bismarck_erinnerungen02_1898_0150_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0304_002.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0142_005.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/menzel_literatur01_1828_0165_021.lstmf
Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0204_018.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/rosenkranz_aesthetik_1853_0167_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/frapan_bittersuess_1891_0256_005.lstmf
Loaded 1/1 pages (1-1) of document data/train/clauren_liebe_1827_0205_021.lstmf
Loaded 1/1 pages (1-1) of document data/train/gutzkow_wally_1835_0143_007.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0057_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0024_023.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/perthes_buchhandel_1816_0012_016.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/poersch_gewerkschaftsbewegung_1897_0018_008.lstmf
^C # this works, stopped

aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 4
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \
>   --continue_from  extracted/eng.lstm \
>   --old_traineddata data/prova/prova.traineddata \
>   --traineddata extracted/eng.traineddata \
>   --model_output data/checkpoints/prova \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
Loaded file data/checkpoints/prova_checkpoint, unpacking...
Code range changed from 60 to 111!
Must supply the old traineddata for code conversion!
Loaded file extracted/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 111!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys64:64, 20736
  Lfx96:96, 61824
  Lrx96:96, 74112
  Lfx512:512, 1247232
  Fc111:111, 56943
Total weights = 1461007
Previous null char=110 mapped to 110
Continuing from extracted/eng.lstm
Loaded 1/1 pages (1-1) of document 
data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/spielhagen_problematische02_1861_0151_022.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0127_011.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0040_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/heine_reisebilder02_1827_0056_003.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wienbarg_feldzuege_1834_0175_017.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/bismarck_erinnerungen02_1898_0150_012.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/keller_sinngedicht_1882_0304_002.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf
Loaded 1/1 pages (1-1) of document 
data/train/paul_flegeljahre01_1804_0142_005.lstmf
Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault

Reply via email to