Shree, thanks for your reply.

But I have another problem in the project which needs your helpness:

Some italicized characters in my data need to be identified, but these 
italic characters tend to be low in recognition. Can I add some italic 
characters to train our model? 

I have observed that we cannot add some italic characters in the 
chi_sim.training_text 
<https://github.com/tesseract-ocr/langdata/blob/master/chi_sim/chi_sim.training_text>
 
file in the https://github.com/tesseract-ocr/langdata/tree/master/chi_sim 
link.

How would I train these italic characters?

在 2017年9月14日星期四 UTC+8下午4:30:40,shree写道:
>
> It is a known problem with the latest code in github - see 
> https://github.com/tesseract-ocr/tesseract/issues/1114
>
> Waiting for fix from Ray.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Sep 14, 2017 at 1:50 PM, <[email protected] <javascript:>> 
> wrote:
>
>>  Hello,
>>
>> I'm trying to train my traineddata model with Tess4.0, following the 
>> commands in the* TrainingTesseract 4.00 *tutorial. The first command to 
>> creat training data is showed as follows:
>>
>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim 
>> --linedata_only \
>> --noextract_font_properties --langdata_dir ../langdata \
>> --fontlist "SIMSUN" --tessdata_dir ./tessdata --output_dir 
>> ~/tesstutorial/trainspecial
>>
>>
>> And the execution log for this command is as follows:
>>
>> === Phase I: Generating training images ===
>> Rendering using SIMSUN
>> [2017年 09月 14日 星期四 16:01:57 CST] /usr/local/bin/text2image 
>> --fontconfig_tmpdir=/tmp/font_tmp.whlzhytMkp --fonts_dir=/usr/share/fonts 
>> --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 
>> --outputbase=/tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0 --max_pages=3 
>> --font=SIMSUN --text=../langdata/chi_sim/chi_sim.training_text
>> Rendered page 0 to file 
>> /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.tif
>>
>> === Phase UP: Generating unicharset and unichar properties files ===
>> [2017年 09月 14日 星期四 16:01:58 CST] /usr/local/bin/unicharset_extractor 
>> --output_unicharset /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.unicharset 
>> --norm_mode 1 /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.box
>> Extracting unicharset from box file 
>> /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.box
>> Invalid Unicode codepoint: 0xffffffe8
>> IsValidCodepoint(ch):Error:Assert failed:in file normstrngs.cpp, line 225
>> ERROR: /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.unicharset does not exist or 
>> is not readable
>>
>>
>> But an error appears in this progress, which shows that chi_sim.unicharset 
>> extracted error. I have checked the directory of 
>> /tmp/tmp.8JcoYdZI17/chi_sim/, 
>> and chi_sim.unicharset file does not exist.
>>
>> How can I modify this error? Can you help me? Thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9b9b26b8-5fc8-42aa-bd7c-2305dffc6fd1%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/9b9b26b8-5fc8-42aa-bd7c-2305dffc6fd1%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7bbbc559-3af3-4971-9be0-4211dea9a699%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to