Hi, Hope below information helps: :)
Creating trained data file own.traineddata : Create box files: tesseract /path/to/image.tif path/and/nameof/boxfile/imgae lstmbox Create unicharset file: unicharset_extractor --norm_mode 1 --output_unicharset ./output/folder/own.unicharset /path/to/image1.box /path/to/image2.box /path/to/imageX.box Create starter traineddatda (aka recoreder): combine_lang_model --input_unicharset ./out/own.unicharset --script_dir ./out --words ./out/eng.wordlist.txt --numbers ./out/eng.numbers.txt --puncs ./out/eng.punc.txt --output_dir ./out --lang own Create training files (for each image): tesseract /path/to/image1.tif /path/to/image1.exp0 --psm 6 lstm.train Train: lstmtraining --traineddata ./out/own/own.traineddata --model_output ./output/own --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c110]" --train_listfile ./eng_ltsm/eng.training_files.txt --eval_listfile ./eng_ltsm/eng.training_files.txt --max_iterations 100 Create Final traineddata: lstmtraining --stop_training --continue_from ./output/own_checkpoint --traineddata ./out/own/own.traineddata --model_output ./output/own.traineddata On Wednesday, 27 May 2020 21:36:08 UTC+5:30, Renan Neri Pereira wrote: > > Hello Guys, > > I`m wanting to train Tesseract OCR for reconize some documents. i have > some images and box files but i don't know how to generate traineddata from > these. I think that the tutorial for training from box files is a little > bad. > > Can anyone help me with that? > > Thanks > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6f6fb38e-3847-4c67-8363-ab4ca1f04745%40googlegroups.com.