Selecting one line sometimes gives better results in devanagari also, but I
think that's because the line segmentation is not happening correctly for
Hindi - because of the above the line / below the line marks. That's why
during training tesseract reports such varying x-heights even though the
whole page is in the same font and size.

Not sure whether this si the cause in your case though.

I am hoping that 3.03 will take care of some of these issues, still waiting
for the source to compile on windows though .

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Fri, Oct 18, 2013 at 1:00 PM, Andreas Lüdeke <alued...@testobject.com>wrote:

> Hey,
>
> some updates:
>
> for preprocessing i started experimenting with unsharp masks to increase
> the local contrast of the image. Thus i am now able to get the 'Wallpaper'
> String.
>
> Anyway i am still wondering why the results are sometimes that bad if i
> recognize the whole screen:
>
> The layout results i got from TessBaseAPIAnalyseLayout are pretty good
> now.
> But on some screens the recognition give completely wrong results for all
> strings. Then i use the results from TessBaseAPIAnalyseLayout to restrict
> the area where reognize should get text from and tada the results are fine.
> Any Ideas why this happens?
>
> BR
>
> Andreas
>
> On Thursday, October 17, 2013 8:15:32 AM UTC+2, Andreas Lüdeke wrote:
>>
>> Hey,
>>
>> i am actually trying to recognize text on screenshots. (See the attached
>> example). My goal is to recognize all strings on a screenshot.
>>
>> I am actually up scaling all images by factor 3 to get at least 300dpi.
>> I am using PSM_SINGLE_BLOCK Page Segmentation.
>> I am using Tess4j Wrapper.
>>
>> With this i got most of the strings out of the images.
>> Most Strings have a lot of issues so that i recognize every string again
>> with its rectangle i got from the first scan.
>> (This isn't quite efficient)
>>
>> Could someone recommend some preprocessing steps to get all strings out
>> of a screenshot (e.g. in the sample the 'Wallpaper' String is missing)
>> Could someone propose how to prevent duplicate recognition of all strings?
>>
>> Thx in advance
>>
>> Andreas
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to