Hello everybody, 

currently I am trying to train just a few layern of the 
eng_best.traineddata file. I already created 30,000 box gt.txt and .tif 
files for training specifically for my problem. 

As I tried to follow the instructions for training tesseract 4 
(https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#training-just-a-few-layers)
 
the following problems/questions occured: 

1. I have to create lstmf files in order to execute 

training/lstmtraining --debug_interval 100 \
--continue_from ~/tesstutorial/eng_from_chi/eng.lstm \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--append_index 5 --net_spec '[Lfx256 O1c111]' \
--model_output ~/tesstutorial/eng_from_chi/base \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 3000 &>~/tesstutorial/eng_from_chi/basetrain.log

but how exactly do I create these lstmf files manually? I read they are 
created with tesstrain.sh but I dont find a proper description how.

I need the lstmf files for the --train_listfile and --eval_listfile 
parameter. 
Is it also necessary to create an extra unicharset file for that like the 
workflow in the tesstrain (https://github.com/tesseract-ocr/tesstrain) 
repository? Or could I also use tesstrain repo for creating the lstmf files?

2. I also have to train the same Symbol twice. With different meanings. Its 
the same sign but once turned 90 degrees counter clockwise. 

As an example assume it's "⊥" when this character is identified I want this 
output from my fully trained model:  "⊥" but when the counter clockwise 
turned symbol is identified I want to get "turned⊥" as a string output back.


I really would appreciate any help. I'm at a dead end and can't find any 
information to help me.

Thanks in advance. If you have any questions about my problem I will 
provide you with any needed information.





-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bfe09984-156e-4a95-8319-2969b485727dn%40googlegroups.com.

Reply via email to