Not exactly an answer, but someone else with the same issue has gotten most
of the way there.

http://stackoverflow.com/questions/24385714/detect-text-region-in-image-using-opencv

On 22 January 2015 at 15:35, newbie <spens.mallang...@gmail.com> wrote:

> ShreeDevi,
> ImageMagick, seems like a manual tool, but I think the problem I need to
> solve is -  a generic way of image preprocessing for all images.
>
> Art,
>    I have been looking for a text region segregation tool, had only one
> from matworks that looked promising. Now with Olena, does it provide an api
> instead of a tool to preprocess(mark text regions) the image
> programatically ?  Will look into the documentation more.
>
> Thanks Art !
>
> On Wednesday, January 21, 2015 at 4:19:54 PM UTC-5, Art Rhyno wrote:
>>
>> I have posted about this before but the Olena project [1] has some great
>> tools to identify text and images. Look for the "content_in_hdoc" program
>> for example. If the identification looks close enough, you could extract
>> and pass to tesseract those regions that have been classed as text. I have
>> attached an example from your "vip1200.jpg" image, the portion in green is
>> identified as text. It also picks up some false positives, but you could
>> probably filter those out.
>>
>> art
>> ---
>> 1. http://www.lrde.epita.fr/cgi-bin/twiki/view/Olena/
>>
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5c36a5b3-1923-424d-beef-e5ce005129a1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5c36a5b3-1923-424d-beef-e5ce005129a1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vhxfKYX21c%2BQ2oJAhg-hiqg3Lq8tEp8VSTEmT_0Ubur8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to