[tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

Anand Akella Mon, 28 Aug 2017 23:49:22 -0700

Hi,
Im new to tesseract and have a pdf file with diacritical marks. I tried to 
run tesseract 4.0.0 with language eng. I see that it is not able to 
recognize the text with diacritical marks. I found a font that can detect 
diacritical mark.

Gandhari Unicode 5.1
<http://andrewglass.org/download.php?fname=gu5-110_ttf&extn=zip>

I tried to extract the fonts files and copied to
/home/tesseract/Downloads/fonts

Whenever i try to run tesstrain.sh it gives me an error "could not find
font named gandhariunicode"

./tesstrain.sh --fontlist 'gandhariunicode' --fonts_dir
/home/tesseract/Downloads/fonts/ --lang eng --langdata_dir
/usr/local/share/tessdata/ --overwrite

=== Starting training for language 'eng'
[Mon Aug 28 23:18:12 PDT 2017] /usr/local/bin/text2image
--fonts_dir=/home/tesseract/Downloads/fonts/ --font=gandhariunicode
--outputbase=/tmp/font_tmp.C9vSySTfge/sample_text.txt
--text=/tmp/font_tmp.C9vSySTfge/sample_text.txt
--fontconfig_tmpdir=/tmp/font_tmp.C9vSySTfge
Could not find font named gandhariunicode.
Pango suggested font Gandhari Unicode.
Please correct --font arg.

=== Phase I: Generating training images ===
ERROR: Could not find training text file
/usr/local/share/tessdata//eng/eng.training_text

What could the issue please let me know. Thanks in advance.

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ca874bc1-1458-49da-bf07-005aacd7d582%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

Reply via email to