Dear all ,
I am trying to run a mass training with tesstrain.h (Have applied patch too
<https://code.google.com/p/tesseract-ocr/source/diff?spec=svn93f7899a9e9afa5411eca6b4ec4831d0b49236f5&name=93f7899a9e9a&r=93f7899a9e9afa5411eca6b4ec4831d0b49236f5&format=side&path=/training/tesstrain.sh>
) . Still I am not able to clear my hurdles .
This is the command which I used
./tesstrain.sh \
--bin_dir /usr/local/bin/ \
--fonts_dir /usr/share/fonts/ \
--lang tam \
--langdata_dir /home/tesseract/training/langdata \
--output_dir /home/tesseract/tam_train/output/ \
--training_text /home/sibi/Desktop/outputscrambled.txt \
--wordlist /home/sibi/Desktop/word_list_lexicon.txt \
--tessdata_dir /usr/local/share/tessdata \
--fontlist "TAU_VASN"
and I got the following output
tee: /tam/tesstrain.log: No such file or directory
=== Starting training for language 'tam'
tee: /tam/tesstrain.log: No such file or directory
Cleaning workspace directory /tam...
mkdir: cannot create directory ‘/tam’: Permission denied
tee: /tam/tesstrain.log: No such file or directory
=== Phase I: Generating training images ===
tee: /tam/tesstrain.log: No such file or directory
Rendering using TAU_VASN
tee: /tam/tesstrain.log: No such file or directory
[Thu Apr 16 20:01:01 IST 2015] /usr/local/bin//text2image --leading=32
--fonts_dir=/usr/share/fonts/ --box_padding=0 --strip_unrenderable_words
--char_spacing=0.0 --exposure=0 --font=TAU_VASN
--outputbase=/tam/tam.TAU_VASN.exp0
--text=/home/sibi/Desktop/outputscrambled.txt
tee: /tam/tesstrain.log: No such file or directory
Initializing fontconfig
Could not find font named TAU_VASN
FLAGS_find_fonts ||
FontUtils::IsAvailableFont(FLAGS_font.c_str()):Error:Assert failed:in file
text2image.cpp, line 417
tee: /tam/tesstrain.log: No such file or directory
ERROR: Program text2image failed. Abort.
What exactly does //Could not find font named TAU_VASN// mean ? I read the
tesstrain.sh introduction again , which quotes
"# NOTE: The font names specified in --fontlist need to be recognizable by
Pango using fontconfig. An easy way to list the canonical names of all
fonts available on
your system is to run text2image with --list_available_fonts and the
appropriate --fonts_dir path."
And hence I performed the following command
text2image --list_available_fonts --fonts_dir usr/share/fonts
For which I got the output as
Initializing fontconfig
I am not able to get any interpretable data from this . Though the font is
present in usr/share/fonts why is it not recognising it .
Once I am able to clear this , I will start looking at what mistakes I made
in the other parameters and start correcting them . If community members
are able to point out mistakes in the paramters it would be great .
-Sibi
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/34afd81d-2845-4fb7-b096-8a812b4a595a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.