Re: Recognizing small short text - empty page

Andres Tue, 24 May 2011 08:55:14 -0700

Hi,

I'm not an expert with Tesseract, but I will give my point of view to see if
it helps:


To recognize single chars avoiding layout analysis in Tesseract 3.0:

http://markmail.org/message/km6fzufboilckjcf

Before of that, you should split the image in order to have one image per
char, you can achieve that in many ways, depending on the particularities of
the images of your app. For example, getting the contours or blobs of the
chars, then computing the bounding boxes and then taking the images inside
that boxes. You can do that with OpenCV for example.

Or you could perhaps scale up the images to see what happens (I'm not sure
about this approach).

Cheers,

Andres
--


2011/5/24 Joyse1 <[email protected]>

> Hi,
>   I'm training Tess with "0123456789" text in MicrosoftSansSerif font size
> 8 ( small one ). I need to recognize small short text only. I have noticed
> that Tess has some build in page layout analisys mechanism which thinks that
> small short text is a noise and it produces empty page or just a part of the
> text. Please write me how can I ommit it? What is the solution for this?
>
> Best
> Jakub
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Recognizing small short text - empty page

Reply via email to