Hi,
Im new to tesseract and have a pdf file with diacritical marks. I tried to 
run tesseract 4.0.0 with language eng. I see that it is not able to 
recognize the text with diacritical marks. I found a font that can detect 
diacritical mark.

Gandhari Unicode 5.1 
<http://andrewglass.org/download.php?fname=gu5-110_ttf&extn=zip>

I tried to extract the fonts files and copied to 
/home/tesseract/Downloads/fonts

Whenever i try to run tesstrain.sh it gives me an error "could not find 
font named gandhariunicode" 

./tesstrain.sh --fontlist 'gandhariunicode' --fonts_dir 
/home/tesseract/Downloads/fonts/ --lang eng --langdata_dir 
/usr/local/share/tessdata/ --overwrite

=== Starting training for language 'eng'
[Mon Aug 28 23:18:12 PDT 2017] /usr/local/bin/text2image 
--fonts_dir=/home/tesseract/Downloads/fonts/ --font=gandhariunicode 
--outputbase=/tmp/font_tmp.C9vSySTfge/sample_text.txt 
--text=/tmp/font_tmp.C9vSySTfge/sample_text.txt 
--fontconfig_tmpdir=/tmp/font_tmp.C9vSySTfge
Could not find font named gandhariunicode.
Pango suggested font Gandhari Unicode.
Please correct --font arg.

=== Phase I: Generating training images ===
ERROR: Could not find training text file 
/usr/local/share/tessdata//eng/eng.training_text

What could the issue please let me know. Thanks in advance.

Thanks,
Anand

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ca874bc1-1458-49da-bf07-005aacd7d582%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to