Not exactly an answer, but someone else with the same issue has gotten most of the way there.
http://stackoverflow.com/questions/24385714/detect-text-region-in-image-using-opencv On 22 January 2015 at 15:35, newbie <spens.mallang...@gmail.com> wrote: > ShreeDevi, > ImageMagick, seems like a manual tool, but I think the problem I need to > solve is - a generic way of image preprocessing for all images. > > Art, > I have been looking for a text region segregation tool, had only one > from matworks that looked promising. Now with Olena, does it provide an api > instead of a tool to preprocess(mark text regions) the image > programatically ? Will look into the documentation more. > > Thanks Art ! > > On Wednesday, January 21, 2015 at 4:19:54 PM UTC-5, Art Rhyno wrote: >> >> I have posted about this before but the Olena project [1] has some great >> tools to identify text and images. Look for the "content_in_hdoc" program >> for example. If the identification looks close enough, you could extract >> and pass to tesseract those regions that have been classed as text. I have >> attached an example from your "vip1200.jpg" image, the portion in green is >> identified as text. It also picks up some false positives, but you could >> probably filter those out. >> >> art >> --- >> 1. http://www.lrde.epita.fr/cgi-bin/twiki/view/Olena/ >> >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5c36a5b3-1923-424d-beef-e5ce005129a1%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5c36a5b3-1923-424d-beef-e5ce005129a1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vhxfKYX21c%2BQ2oJAhg-hiqg3Lq8tEp8VSTEmT_0Ubur8A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.