[tesseract-ocr] Re: Fine tuning existing model

Raniem Mon, 10 Sep 2018 05:31:21 -0700

Thanks Lorenzo.

Your method makes all the magic I needed.


One other question please, I am attempting to fine tune only the last 
layer, so I have replaced  the 
--net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 
data/unicharset`]" \

int the lstmtraining command with: 

--continue_from $(TESSDATA)/$(CONTINUE_FROM).lstm \
--append_index 5 --net_spec '[Lfx256 O1c69]'

but I am getting this error :
*int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 222*
*Makefile:129: recipe for target 'data/checkpoints/eng_checkpoint' failed*
*make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core 
dumped)*

can any one please advice on what I am doing wrong?
P.S my unicharset contains 69 character.


Regards

On Friday, September 7, 2018 at 12:01:06 AM UTC+1, Raniem wrote:
>
> Thanks for the detailed answer, I am giving it a shot and hoping for 
> getting some better results :) 
>
> Thanks for all your help and support
>
> Best Regards
>
> On Friday, June 29, 2018 at 1:01:08 PM UTC+1, Lorenzo Blz wrote:
>>
>> 
>>
>> Hi,
>> I'm trying to do fine tuning of an existing model using line images and 
>> text labels. I'm running this version:
>>
>> tesseract 4.0.0-beta.3-56-g5fda
>>  leptonica-1.76.0
>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : 
>> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>>
>>
>>
>> I used OCR-D to generate lstmf files for the demo data.
>>
>> If I run the make command it works fine. 
>>
>> make training MODEL_NAME=prova
>>
>> Now I isolated this command from the build:
>>
>> lstmtraining \
>>   --traineddata data/prova/prova.traineddata \
>>   --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head 
>> -n1 data/unicharset`]" \
>>   --model_output data/checkpoints/prova \
>>   --learning_rate 20e-4 \
>>   --train_listfile data/list.train \
>>   --eval_listfile data/list.eval \
>>   --max_iterations 10000
>>
>> and it works fine.
>>
>> Now I'm trying to modify it to fine tune the existing eng model. I made a 
>> few attempts, all ending into different errors (see the attached file for 
>> full output).
>>
>> I used:
>>
>> combine_tessdata -e /usr/local/share/tessdata/eng.traineddata 
>> extracted/eng.lstm
>>
>> to extract the eng.lstm model. 
>>
>> This seems to works but I'm not sure it is the correct.
>>
>> lstmtraining \
>>   --continue_from  extracted/eng.lstm \
>>   --traineddata data/prova/prova.traineddata \
>>   --old_traineddata extracted/eng.traineddata \
>>   --model_output data/checkpoints/prova \
>>   --learning_rate 20e-4 \
>>   --train_listfile data/list.train \
>>   --eval_listfile data/list.eval \
>>   --max_iterations 10000
>>
>> (extracted/eng.traineddata is just a copy of eng.traineddata)
>>
>>
>> The training resume exactly with the RMS of prova_checkpoint (6%) so it 
>> looks like it is training from that checkpoint, not the eng.lstm.
>>
>> Is this correct? What should I change?
>> 
>> I'm following this guide:
>>
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
>>
>> 
>> I think continue_from and traineddata should refer to the eng model and 
>> old_traineddata should point to prova.traineddata, but if I do that I get a 
>> segmentation fault:
>>
>> [...]
>> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
>> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
>> Segmentation fault
>>
>> What am I missing?
>>
>>
>> Thanks, bye
>>
>> Lorenzo
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/aac121aa-4f22-4785-926d-a22b3985974a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Fine tuning existing model

Reply via email to