I'd be grateful if someone could help me here.

Here is my request to Zdenko and the reply. 

 Could you perhaps help me understand, and then change the page, the 
> meaning of:
> "A quick check is to count the pixels of the x-height of your characters. 
> (X-height is the height of the lower case x.)"
> I have no idea what this means or how to do it.
>
> Well then it would better if you find something else than tesseract. 
Honestly. You will be lost and disappointed with tesseract because 
tesseract requires some knowledge (e.g. from image processing). It could be 
compared to university - if you got there it is expected that you finished 
your studies in high-school. Nobody there will bother to explain you 
basis...   IMO there can not be clearer definition of x-height and what to 
do with it. BTW it is in FAQ and you complain about wrong information in 
Compilation wiki ;-)

Here is what the FAQ says:

There is a minimum text size for reasonable accuracy. You have to consider 
resolution as well as point size. Accuracy drops off below 10pt x 300dpi, 
rapidly below 8pt x 300dpi. A quick check is to count the pixels of the 
x-height of your characters. (X-height is the height of the lower case x.) 
At 10pt x 300dpi x-heights are typically about 20 pixels, although this can 
vary dramatically from font to font. Below an x-height of 10 pixels, you 
have very little chance of accurate results, and below about 8 pixels, most 
of the text will be "noise removed". 

So if someone could help me, I'm sure I wouldn't be the only one to benefit.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to