Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

ShreeDevi Kumar Mon, 11 Dec 2017 20:42:05 -0800

Your script seems to look ok.

--U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
necessary; doesn't make a difference
is NOT required


I will suggest that you remove files from an earlier run, before running
the script.

Take a look at  $train_output_dir/eng directory and review the unicharset
there to see whether your new characters are included in the unicharset.

Take a look at the log file, specially in the initial portion to see
whether it shows increase in number of characters.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Dec 12, 2017 at 9:24 AM, J Klein <jetm...@gmail.com> wrote:

>
> On Thursday, December 7, 2017 at 11:55:53 PM UTC-5, shree wrote:
>>
>> Please check the last section on
>>  https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>
>
> Thank you for this tip.   I'm getting farther than before.  I thought
> --trainedata was my final traineddata output file.
> I now made the final eng.trainedata  'lstmtraining --stop_training ...."
> as follows
>
>     $tesstrain_dir/lstmtraining \
> --stop_training \
> --continue_from $train_output_dir/pluschars_checkpoint \
> --traineddata $train_output_dir/eng/eng.traineddata \
> --U  $train_output_dir/eng/eng.unicharset \   # not sure if this is
> necessary; doesn't make a difference
> --model_output $final_trained_data_file
>
> And I get a $final_trained_data_file that I can use to replace
> /usr/local/share/tessdata/eng.traineddata and it doesn't fail on init3()
> any more.  But it doesn't recognize any of the new chars either.
> However, in running
>
>   /usr/local/bin/tesseract-training/lstmeval \
>     --model ./trained_plus_chars/pluschars_checkpoint  \
>     --traineddata ./trained_plus_chars/eng/eng.traineddata \
>     --eval_listfile ./trained_plus_chars/eng.training_files.txt
>
> it DID recognize the new chars most of the time.  So I think there may
> still be something something wrong with the construction of the --model_output
> $final_trained_data_file.
>
> My entire training sequence bash script is here:  
> *https://pastebin.com/gNLvXkiM
> <https://pastebin.com/gNLvXkiM>*
>
> Can you tell if there is anything obviously wrong?
>
>
> Thanks
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/10194cda-9e8d-494c-ae4a-157e3d25f913%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVp9_2bwOYdWsFnLsWusK_N9p3htvGJEd4X7UjmmTskNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Trying to add chars to tesseract 4.0

Reply via email to