Yes the spacing algorithm needs a total rewrite.
The problem is that trying to be general makes it more difficult to get the
typical case right.
When text is justified in a narrow column, eg a newpaper, the space between
letters and between words can vary from line to line, so it is difficult to
t
I have had the same experience getting spaces in many spots where none
should exist. Since I have no idea how to navigate the many Tess
variables, my approach has been to test and remove such spaces myself
post-scan, based on the width & spacing of characters in the current
word. Indeed italic or
In fact tesseract constantly and consistently fails on italic
uppercase fonts. In such fonts characters are have low spacing (in
measured in vertical spacing) and in many cases even overlap. I tried
to fix the source code with no success. It is not a matter of
ajdusting few constants. It is a desi
Hello colleagues,
I have the following problem: after a successful training, during the OCR
process Tesseract puts additional spaces non-existing in the text in the
middle of some words, e.g. it splits the word "HRISTOVICH" to "HRISTO" +
[space] + "VICH". In this particular example the word is
4 matches
Mail list logo