>By version alpha, I trained about 1000 line and it is not so bad

You must have only done fine tuning of model then and now you are trying to
train from scratch.

On Wed, 26 Sep 2018, 04:01 Khosrobeigy.zohreh, <beigy.zoh...@gmail.com>
wrote:

> I know, actually I am master in lstm. I want to resolve all error and then
> train big text.
> By version alpha, I trained about 1000 line and it is not so bad. But in
> version beta 4 I got many error.
> In alpha,
> # Use LSTM
> tessedit_ocr_engine_mode 1
> tessedit_pageseg_mode 6
>
> # Arabic page layout variables
> segment_nonalphabetic_script 1
>
> # Avoid dropping rows
> textord_noise_rowratio 20.0
> textord_noise_syfract 0.6
>
> textord_min_linesize 2.5
>
> # Avoid over-estimating intra-word spacing at both row and
> # block levels when using old to method
> tosp_old_to_method T
> tosp_old_to_constrain_sp_kn T
> tosp_old_sp_kn_th_factor 4.0
>
> tosp_only_small_gaps_for_kern T
> tosp_use_pre_chopping T
>  I used all these, but now my model doesn't learn.
> Has any thing changed in beta 4 for example text2image?
>
> On Wed, Sep 26, 2018 at 12:53 AM Shree Devi Kumar <shreesh...@gmail.com>
> wrote:
>
>>   --fontlist "Arial"
>>
>> Does that have good coverage for Farsi?
>>
>>
>> --max_iterations 5000
>>
>> You are trying to train from scratch with 18000 lines of text and only
>> 5000 iterations. That will not work.
>>
>> Ray has trained on hundreds of thousands of lines of text and millions of
>> iterations.
>>
>> On Tue, 25 Sep 2018, 16:20 Zohreh Khosrobeygi, <beigy.zoh...@gmail.com>
>> wrote:
>>
>>> Hi, I use this :
>>> tesseract 4.0.0-beta.4
>>>  leptonica-1.74.4
>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>> zlib 1.2.8
>>>
>>>  Found AVX2
>>>  Found AVX
>>>  Found SSE
>>> I've trained about 18000 line for persian language. I use this command:
>>>
>>> bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
>>> --training_text
>>>  
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt
>>> --wordlist
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt
>>> --linedata_only \
>>>   --noextract_font_properties --langdata_dir
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata \
>>>   --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
>>>   --fontlist "Arial" --output_dir
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
>>> and then run this:
>>> sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining   \
>>>   --traineddata
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata
>>>  --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
>>>   --model_output
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base
>>> --learning_rate 0.001 \
>>>   --train_listfile
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt
>>> \
>>>   --eval_listfile
>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>>> \
>>>   --max_iterations 5000
>>> &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
>>> but always show Compute CTC targets failed and the model is not well at
>>> all.
>>> I normal my text and each line of the text have 20 token(max).
>>> Could you pleas help me?
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/hGQMuZip6io/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcjmoC%2BfvY5qvn3e4PBVMhBFiEGDGP9WCkEUnsygQTpw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcjmoC%2BfvY5qvn3e4PBVMhBFiEGDGP9WCkEUnsygQTpw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> Zohreh Khosrobeygi
> University of Tehran, 2016
> Tel: +989196042887
> khosrobeygi.zo...@ut.ac.ir <khosrobeygi.zoh...@ut.ac.ir>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgxi-B-N7K32SzHtaxoQFQiYLVA%3Du65V6stVG3vPEJmMRw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgxi-B-N7K32SzHtaxoQFQiYLVA%3Du65V6stVG3vPEJmMRw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWbkUXCzx7LE41F6p6R4WCj-_YCPDQuaJJOAstd0BgO0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to