Measure the height of a lower case 'x' in your image using an image program, such as Gimp or the standard image viewer on your platform (such as Windows Paint or Mac Preview).
If the height of a lower-case 'x' in your text is less than 20 pixels, you need to resize it or rescan your documents. --Sven On Mon, Nov 12, 2012 at 10:40 AM, chikev <[email protected]> wrote: > I'd be grateful if someone could help me here. > > Here is my request to Zdenko and the reply. > > Could you perhaps help me understand, and then change the page, the >> meaning of: >> "A quick check is to count the pixels of the x-height of your characters. >> (X-height is the height of the lower case x.)" >> I have no idea what this means or how to do it. >> >> Well then it would better if you find something else than tesseract. > Honestly. You will be lost and disappointed with tesseract because > tesseract requires some knowledge (e.g. from image processing). It could be > compared to university - if you got there it is expected that you finished > your studies in high-school. Nobody there will bother to explain you > basis... IMO there can not be clearer definition of x-height and what to > do with it. BTW it is in FAQ and you complain about wrong information in > Compilation wiki ;-) > > Here is what the FAQ says: > > There is a minimum text size for reasonable accuracy. You have to consider > resolution as well as point size. Accuracy drops off below 10pt x 300dpi, > rapidly below 8pt x 300dpi. A quick check is to count the pixels of the > x-height of your characters. (X-height is the height of the lower case x.) > At 10pt x 300dpi x-heights are typically about 20 pixels, although this can > vary dramatically from font to font. Below an x-height of 10 pixels, you > have very little chance of accurate results, and below about 8 pixels, most > of the text will be "noise removed". > > So if someone could help me, I'm sure I wouldn't be the only one to > benefit. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

