Hi,

Hope below information helps: :)

Creating trained data file own.traineddata :

Create box files: tesseract /path/to/image.tif 
path/and/nameof/boxfile/imgae lstmbox

Create unicharset file: unicharset_extractor --norm_mode 1 
--output_unicharset ./output/folder/own.unicharset /path/to/image1.box 
/path/to/image2.box /path/to/imageX.box

Create starter traineddatda (aka recoreder): combine_lang_model 
--input_unicharset ./out/own.unicharset --script_dir ./out --words 
./out/eng.wordlist.txt --numbers ./out/eng.numbers.txt --puncs 
./out/eng.punc.txt --output_dir ./out --lang own

Create training files (for each image): tesseract /path/to/image1.tif 
/path/to/image1.exp0 --psm 6 lstm.train

Train: lstmtraining --traineddata ./out/own/own.traineddata --model_output 
./output/own --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 
O1c110]" --train_listfile ./eng_ltsm/eng.training_files.txt --eval_listfile 
./eng_ltsm/eng.training_files.txt --max_iterations 100

Create Final traineddata: lstmtraining --stop_training --continue_from 
./output/own_checkpoint --traineddata ./out/own/own.traineddata 
--model_output ./output/own.traineddata


On Wednesday, 27 May 2020 21:36:08 UTC+5:30, Renan Neri Pereira wrote:
>
> Hello Guys,
>
> I`m wanting to train Tesseract OCR for reconize some documents. i have 
> some images and box files but i don't know how to generate traineddata from 
> these. I think that the tutorial for training from box files is a little 
> bad.
>
> Can anyone help me with that?
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6f6fb38e-3847-4c67-8363-ab4ca1f04745%40googlegroups.com.

Reply via email to