Selecting one line sometimes gives better results in devanagari also, but I think that's because the line segmentation is not happening correctly for Hindi - because of the above the line / below the line marks. That's why during training tesseract reports such varying x-heights even though the whole page is in the same font and size.
Not sure whether this si the cause in your case though. I am hoping that 3.03 will take care of some of these issues, still waiting for the source to compile on windows though . Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Oct 18, 2013 at 1:00 PM, Andreas Lüdeke <alued...@testobject.com>wrote: > Hey, > > some updates: > > for preprocessing i started experimenting with unsharp masks to increase > the local contrast of the image. Thus i am now able to get the 'Wallpaper' > String. > > Anyway i am still wondering why the results are sometimes that bad if i > recognize the whole screen: > > The layout results i got from TessBaseAPIAnalyseLayout are pretty good > now. > But on some screens the recognition give completely wrong results for all > strings. Then i use the results from TessBaseAPIAnalyseLayout to restrict > the area where reognize should get text from and tada the results are fine. > Any Ideas why this happens? > > BR > > Andreas > > On Thursday, October 17, 2013 8:15:32 AM UTC+2, Andreas Lüdeke wrote: >> >> Hey, >> >> i am actually trying to recognize text on screenshots. (See the attached >> example). My goal is to recognize all strings on a screenshot. >> >> I am actually up scaling all images by factor 3 to get at least 300dpi. >> I am using PSM_SINGLE_BLOCK Page Segmentation. >> I am using Tess4j Wrapper. >> >> With this i got most of the strings out of the images. >> Most Strings have a lot of issues so that i recognize every string again >> with its rectangle i got from the first scan. >> (This isn't quite efficient) >> >> Could someone recommend some preprocessing steps to get all strings out >> of a screenshot (e.g. in the sample the 'Wallpaper' String is missing) >> Could someone propose how to prevent duplicate recognition of all strings? >> >> Thx in advance >> >> Andreas >> > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.