It's a language thing: https://en.wikipedia.org/wiki/Typographic_ligature

Try specifying a specific language?

This parameter seems like a possible association (due to the description 
containing glyph): 
segment_penalty_dict_nonword    1.25    Score multiplier for glyph fragment 
segmentations which do not match a dictionary word (lower is better).

Let me know what you find. I had this occur recently but have been chasing 
other issues and haven't verified a solution.


On Saturday, September 3, 2016 at 5:23:55 AM UTC-4, Brais Gabín Moreira 
wrote:
>
> Hi, I'm trying to train tesseract. But text2image creates a single box for 
> 'fi' or 'fl'. Why it thinks that 'fi' or 'fl' are a single character 
> instead of two? How can I fix this?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d0e43a06-9f9a-4de8-9cf1-965f898cea8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to