Try first with

best/Latin.traineddata

that should handle text with diacritics

-----------

>>Pango suggested font Gandhari Unicode.

Use "Gandhari Unicode" within quotes as Font name

>>ERROR: Could not find training text file /usr/local/share/tessdata//
eng/eng.training_text

give script_dir link to langdata folder where you have your training text

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Aug 29, 2017 at 11:58 AM, Anand Akella <anand.ake...@gmail.com>
wrote:

> Hi,
> Im new to tesseract and have a pdf file with diacritical marks. I tried to
> run tesseract 4.0.0 with language eng. I see that it is not able to
> recognize the text with diacritical marks. I found a font that can detect
> diacritical mark.
>
> Gandhari Unicode 5.1
> <http://andrewglass.org/download.php?fname=gu5-110_ttf&extn=zip>
>
> I tried to extract the fonts files and copied to /home/tesseract/Downloads/
> fonts
>
> Whenever i try to run tesstrain.sh it gives me an error "could not find
> font named gandhariunicode"
>
> ./tesstrain.sh --fontlist 'gandhariunicode' --fonts_dir
> /home/tesseract/Downloads/fonts/ --lang eng --langdata_dir
> /usr/local/share/tessdata/ --overwrite
>
> === Starting training for language 'eng'
> [Mon Aug 28 23:18:12 PDT 2017] /usr/local/bin/text2image
> --fonts_dir=/home/tesseract/Downloads/fonts/ --font=gandhariunicode
> --outputbase=/tmp/font_tmp.C9vSySTfge/sample_text.txt
> --text=/tmp/font_tmp.C9vSySTfge/sample_text.txt
> --fontconfig_tmpdir=/tmp/font_tmp.C9vSySTfge
> Could not find font named gandhariunicode.
> Pango suggested font Gandhari Unicode.
> Please correct --font arg.
>
> === Phase I: Generating training images ===
> ERROR: Could not find training text file /usr/local/share/tessdata//
> eng/eng.training_text
>
> What could the issue please let me know. Thanks in advance.
>
> Thanks,
> Anand
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ca874bc1-1458-49da-bf07-005aacd7d582%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ca874bc1-1458-49da-bf07-005aacd7d582%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVvNa%3DzGWHvZJ6aOa8r2x7frtPrrQ_P1oxV0U7xOmAhuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to