[tesseract-ocr] Help: Unable to Combine the Output Files - Training from Scratches

minh...@gmail.com Mon, 17 Aug 2020 21:13:22 -0700

Dear all, 

I follows the manuals in wiki, but still get errors at the end. 
I work in Mac OS 10.15.6 Catalina
Tesseract 4.1.1
Lstmtraining 4.1.1


Here is my process:

# Create train data, language Viet for only *Time New Roman* FONT

PANGOCAIRO_BACKEND=fc \ 
~/tesseract/src/training/tesstrain.sh \ 
--fonts_dir /Library/Fonts \ 
--lang vie \ 
--linedata_only \ 
--noextract_font_properties \ 
--exposures "0" \ 
--langdata_dir ~/tesstutorial/langdata \ 
--tessdata_dir ~/tesstutorial/tesseract/tessdata \ 
--fontlist "Times New Roman" \ 
--training_text ~/tesstutorial/langdata/vie/vie.training_text \ 
--output_dir ~/tesstutorial/vietrain

in dir ~/tesstutorial/langdata: I put the *best vie.traineddata, *and 
vie.punc, vie.wordlist, vie.wordlist, vie.number (I don't know if it is 
necessary?)

# Create evaluation data, language Viet for only *Time New Roman* FONT
using other data

PANGOCAIRO_BACKEND=fc \ 
~/tesseract/src/training/tesstrain.sh \ 
--fonts_dir /Library/Fonts \ 
--lang vie \ 
--linedata_only \ 
--noextract_font_properties \ 
--exposures "0" \ 
--langdata_dir ~/tesstutorial/langdata \       (dir has best traineddata 
Sep 2017)
--tessdata_dir ~/tesstutorial/tesseract/tessdata \  
--fontlist "Times New Roman" \ 
--training_text ~/tesstutorial/langdata/vie_eval/vie.training_text \ 
--output_dir ~/tesstutorial/vieeval

# Then I continue training using lstmtraining

lstmtraining \
--debug_interval 100 \ 
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \ 
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ 
--model_output ~/tesstutorial/vieoutput/base \ 
--learning_rate 20e-4 \ 
--train_listfile ~/tesstutorial/vietrain/vie.training_files.txt \ 
--eval_listfile ~/tesstutorial/vieeval/vie.training_files.txt \ 
--max_iterations 100000 &>~/tesstutorial/vieoutput/basetrain.log

So far, there is no error, there are several base...checkpoint generated

# Last step, combine output

Do I have to provide best traineddata so that the final output traineddata 
will have all required components?

I get error  *Must provide a --traineddata see training wiki *

Here are what I tried

lstmtraining --stop_training \ 
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \ 
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \        
(produced at the first step)
--model_output ~/tesstutorial/vieoutput/vie.traineddata 

or 

lstmtraining --stop_training \ 
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \ 
--traineddata ~ /tesstutorial/vietrain/vie/vie.traineddata\        
(produced at the first step)
--old_traineddata ~/tesstutorial/langdata/vie.traineddata  \    (dir has 
best traineddata Sep 2017)
--model_output ~/tesstutorial/vieoutput/vie.traineddata 

I read carefully wiki, but there is not any solutions.
Please, anyone can point out what wrong with my process?
Is there anything missing?

Many thanks,

TuPM 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6513de98-715b-4c3d-bf0d-e4bea3828f7an%40googlegroups.com.

[tesseract-ocr] Help: Unable to Combine the Output Files - Training from Scratches

Reply via email to