Could you also please advise for training experience

I am training Vietnamese for only Time New Romans at this time.

The best traineddata is good, but it is big (for all fonts) and take quite 
a long time to process

I plan to train from scratch,
*...*


*--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt 
\--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log*

After 5000 iterations, *Error rate = 76.676   *it is so high

What should I do next?
It is any improvements if I rerun the above training for second/third time 
(with same data in *--train_listfile ~*). As I thought, each time the 
traineddata is updated.
Is it a way to exact traineddata from best_traineddata for some selected 
fonts?

Thanks,

TuPM

On Friday, August 7, 2020 at 9:30:33 AM UTC+7 minh...@gmail.com wrote:

> Many thanks Shree,
>
> As you suggest, I remove the the path, now it works now
>
> by the way, my tesseract and lstm version:
>
> tesseract 5.0.0-alpha-773-gd33ed l
> eptonica-1.78.0 
>
> ~ % lstmtraining -v 
> 5.0.0-alpha-773-gd33ed
> On Friday, August 7, 2020 at 8:43:02 AM UTC+7 shree wrote:
>
>> If you have tesseract and all training tools installed, you should be 
>> able to use 
>> tesseract
>> lstmtraining
>> etc without giving the path.
>>
>> What's the output of
>>
>> which tesseract
>> tesseract -v
>> which lstmtraining
>> lstmtraining -v
>>
>>
>>
>> On Fri, Aug 7, 2020, 01:13 minh...@gmail.com <minh...@gmail.com> wrote:
>>
>>> Sorry that I forgot to note: 
>>>
>>> I use Mac OS 10.15.6 Catalina
>>>
>>> The tessseract version: tesseract 5.0.0-alpha-773-gd33ed
>>>
>>> Also, tesseract is installed via MacPorts, since installation via brew 
>>> has a lot of errors.
>>>
>>> Thanks,
>>> On Friday, August 7, 2020 at 2:40:06 AM UTC+7 minh...@gmail.com wrote:
>>>
>>>> Dear friends,
>>>>
>>>> I have tried to run tesseract followed the guide in: 
>>>> https://github.com/tesseract-ocr/tesseract/issues/1453
>>>>
>>>> Until the step 10: 
>>>>
>>>> SCROLLVIEW_PATH=~/tesseract/java \
>>>> ~/tesseract/src/training/lstmtraining \ 
>>>> --debug_interval 100 \ 
>>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ 
>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' 
>>>> \ 
>>>> --model_output ~/tesstutorial/engoutput/base \ 
>>>> --learning_rate 20e-4 \ 
>>>> --debug_interval -1 \ 
>>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ 
>>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ 
>>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
>>>>
>>>> then no thing happen, in the basetrain.log:
>>>> *zsh: no such file or directory: 
>>>> /Users/minhtupham/tesseract/src/training/lstmtraining*
>>>>
>>>> is there missing lstmtraining file?
>>>> I check in the training folder, there is a file name "lstmtraining.cpp"
>>>>
>>>> Please help me what I have to do?
>>>>
>>>> Many thanks,
>>>>
>>>> TuPM
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5c4f1657-252f-4f5e-be85-b55b78c21bf3n%40googlegroups.com.

Reply via email to