[tesseract-ocr] Generate lstm train file from existing Tif and Box for tesseract 5, MacOS

minh...@gmail.com Sun, 09 Aug 2020 02:36:56 -0700

Dear friends,

I want to train tesseract lstm for some scan documents.
Since the scan files are not so good, I have tried to make their 
corresponding box with jTessBoxEditor, the boxes and the characters were 
not so good recognized and need to correct manually.
After few days, now I have 3 files: 
vie.timesnewromani.exp99.tif, 
vie.timesnewromani.exp99.box 
vie.timesnewromani.exp99.tr


Now, I need to convert them into lstm for training, I have modified the 
tesstrain.sh

mkdir -p ${TRAINING_DIR}
tlog "\n=== Starting training for language '${LANG_CODE}'"

cp  ~/tesstutorial/langdata/${LANG_CODE}/*.box ${TRAINING_DIR}
cp  ~/tesstutorial/langdata/${LANG_CODE}/*.tif ${TRAINING_DIR}

source "$(dirname $0)/language-specific.sh"
set_lang_specific_parameters ${LANG_CODE}

I did copy all three files to langdata/vie/

but it seems that the files were not copied to the tmp train folder:

Please give me some advices, 

Many thanks,

TuPM




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2b4d2343-083e-4904-8314-d0ec9706506dn%40googlegroups.com.

[tesseract-ocr] Generate lstm train file from existing Tif and Box for tesseract 5, MacOS

Reply via email to