cheers
that was easy!!
many thanks
I wonder if Z will now change the FAQ to tell ppl to use an image program to do the measuring?

Cheers

[email protected]
32 Hawera Rd
Kohimarama 1071
Auckland, New Zealand
+64 (0)9 528 1174 home
+64 (0)226 710 335 cell
http://kmccready.wordpress.com

On 13/11/12 08:20, Sven Pedersen wrote:
Measure the height of a lower case 'x' in your image using an image program, such as Gimp or the standard image viewer on your platform (such as Windows Paint or Mac Preview).

If the height of a lower-case 'x' in your text is less than 20 pixels, you need to resize it or rescan your documents.
--Sven


On Mon, Nov 12, 2012 at 10:40 AM, chikev <[email protected] <mailto:[email protected]>> wrote:

    I'd be grateful if someone could help me here.

    Here is my request to Zdenko and the reply.

        Could you perhaps help me understand, and then change the
        page, the meaning of:
        "A quick check is to count the pixels of the x-height of your
        characters. (X-height is the height of the lower case x.)"
        I have no idea what this means or how to do it.

    Well then it would better if you find something else than
    tesseract. Honestly. You will be lost and disappointed with
    tesseract because tesseract requires some knowledge (e.g. from
    image processing). It could be compared to university - if you got
    there it is expected that you finished your studies
in high-school. Nobody there will bother to explain you basis... IMO there can not be clearer definition of x-height and what to do
    with it. BTW it is in FAQ and you complain about wrong information
    in Compilation wiki ;-)

    Here is what the FAQ says:

    There is a minimum text size for reasonable accuracy. You have to
    consider resolution as well as point size. Accuracy drops off
    below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is
    to count the pixels of the x-height of your characters. (X-height
    is the height of the lower case x.) At 10pt x 300dpi x-heights are
    typically about 20 pixels, although this can vary dramatically
    from font to font. Below an x-height of 10 pixels, you have very
    little chance of accurate results, and below about 8 pixels, most
    of the text will be "noise removed".

    So if someone could help me, I'm sure I wouldn't be the only one
    to benefit.

-- You received this message because you are subscribed to the Google
    Groups "tesseract-ocr" group.
    To post to this group, send email to
    [email protected] <mailto:[email protected]>
    To unsubscribe from this group, send email to
    [email protected]
    <mailto:tesseract-ocr%[email protected]>
    For more options, visit this group at
    http://groups.google.com/group/tesseract-ocr?hl=en




--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to