Dmitri,
Thanks for the valuable suggestion
With regards,
-sriranga(78yrs)

On Sat, Aug 20, 2011 at 5:49 PM, Dmitri Silaev <[email protected]>wrote:

> As a rule of the thumb, usually one can obtain good recognition
> results for all standard regular fonts of 11-16pt size, be it a
> screenshot or a 300 DPI scanned image. Should font size, resolution,
> etc. differ significantly from these numbers, recognition quality
> becomes a matter of experimentation.
>
> Warm regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
>
>
> On Sat, Aug 20, 2011 at 2:14 PM, Sriranga(78yrsold)
> <[email protected]> wrote:
> > Dmitri,
> > really the issue is very complex/complicated to understand by layman
> user.
> > For training purpose in tesseract-ocr, , what is your expertise valuable
> > guidance to be followed by users - who uses generally depends on scanner
> > machine and "Print Screen"Key of the computer..
> > 1)for scanning the typed text  - ,(a) font size in the text should be
> > used.(b) resolution to be set in the scanner.
> > 2)For Screenshot of the typed text file= with help of Irfanview, or
> > imagemagic etc. resolution should be increased from 96 to 300 dpi
> > for any image format like tif, png etc.
> > With regards,
> > -sriranga(78yrs)
> >
> >
> > On Sat, Aug 20, 2011 at 1:33 PM, Dmitri Silaev <[email protected]>
> > wrote:
> >>
> >> There are different cases of how pixel height of a font's character
> >> should be calculated. If you're trying to recognize a screenshot, you
> >> may deem one pt to be equal to one pixel when typing it in Windows
> >> Paint. However this might not be true for more complex editors like
> >> Photoshop. Also this depends on physical size of screen's pixel and
> >> current video mode resolution. Another case is a scanned image, here
> >> pixel height depends on scanning resolution. Still another case, where
> >> imho trying to relate pixel height to font's point size absolutely
> >> lacks sense (however it is possible via some multi-parameter
> >> formulas), is a photographic or video frame image; here pixel height
> >> varies depending on the camera position and even can vary within a
> >> single line of text.
> >>
> >> All in all, Tesseract does not bother itself with DPIs, pt sizes,
> >> etc.; only pixel size is important for recognition. You can use this
> >> formula for scanned images to roughly determine font pixel height:
> >>
> >> pixels = DPI * pts / 72
> >>
> >> where pixels - pixel height to be found, DPI - scanning resolution,
> >> pts - size of font in typographic points
> >>
> >> However the most reliable is to scan a test page and manually count
> >> pixels.
> >>
> >> For those willing to understand everything, here are the links:
> >> http://en.wikipedia.org/wiki/Dots_per_inch
> >> http://en.wikipedia.org/wiki/Point_%28typography%29
> >> http://en.wikipedia.org/wiki/X-height
> >>
> >> Warm regards,
> >> Dmitri Silaev
> >> www.CustomOCR.com
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Aug 20, 2011 at 7:48 AM, Sriranga(78yrsold)
> >> <[email protected]> wrote:
> >> > Dmitri,
> >> > Thanks for the valuable guidance. I seek some clarification as
> follows=
> >> > (1)"Tesseract, trained with ordinary fonts, proved good with fonts
> >> > of12-64
> >> > pixel height" it would be nice, if indicated equivalent font size for
> >> > pixel
> >> > of 12-64? For 10 or 20 pt size of the regular(ordinary) font what is
> the
> >> > pixel height used in the Notepad?
> >> > I am not programmer nor developer - as such I am seeking valuable
> >> > guidance
> >> > as user.
> >> > BTW Is it to possible to count the pixel of any size say 20 pt of
> >> > regular in
> >> > the paint brush in which it has gird ( graph like). Just
> >> > now I tested in paintbrush vide screenshot attached. alphabets was
> typed
> >> > using Arial- 20 and  counted pixel -it has 20 pixels.
> >> >
> >> > Thus it is presumed that 12-64 pixel height is equivalent to 12-64
> point
> >> > size of the ordinary font - kindly confirm.
> >> > With warmest regards,
> >> > -sriranga(78yrs)
> >> >
> >> >
> >> > On Sat, Aug 20, 2011 at 1:00 AM, Dmitri Silaev <[email protected]
> >
> >> > wrote:
> >> >>
> >> >> The DPI measure is confusing for Tesseract's OCR, forget about it.
> The
> >> >> big thing is within-image font's x-height, measured in pixels.
> >> >> Tesseract, trained with ordinary fonts, proved good with fonts of
> >> >> 12-64 pixel height. If you have bigger characters, scale them down.
> If
> >> >> you have a font that's bold, use morphology and erode characters
> after
> >> >> binarization. Experiment. Removing "greyness" won't help as it's not
> a
> >> >> generic way of getting rid of uneven illumination; you need to use
> >> >> more sophisticated algorithms. Just using Photoshop won't let you
> >> >> achieve much.
> >> >>
> >> >> Warm regards,
> >> >> Dmitri Silaev
> >> >> www.CustomOCR.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Aug 19, 2011 at 8:18 PM, Andriy Malovanyy <
> [email protected]>
> >> >> wrote:
> >> >> > To Zdenko:
> >> >> > I think I have 3.0 version installed, so maybe I should reinstall
> the
> >> >> > new version and try it. Thanks for the description of psm. Did you
> >> >> > try
> >> >> > to recognize other unedited images which I attached to
> >> >> > the first post??
> >> >> >
> >> >> > To Rob:
> >> >> > Initially I had 640x480 image with 72dpi with number occupying
> almost
> >> >> > all the image. What I did is just opened the image in Photoshop,
> went
> >> >> > to size of image menu, changed the resolution to 300 dpi (image
> >> >> > increased in size) and set the image size back to 640x480. So, with
> >> >> > that I got 640x480 image with 300dpi resolution.
> >> >> >
> >> >> > On 19 Aug, 17:56, Robert Komar <[email protected]> wrote:
> >> >> >> On Fri, 19 Aug 2011, Andriy Malovanyy wrote:
> >> >> >> > To sriranga:
> >> >> >> > I tried changing dpi (check the previous post). It doesnt work.
> >> >> >>
> >> >> >> Did you rescale the image from 72 dpi to 300 dpi, or just change
> >> >> >> the tag on the original image to say 300 dpi?  The latter won't
> >> >> >> work.
> >> >> >> Tesseract seems to be tuned to work best for scans at 300 dpi
> >> >> >> (although I've often successfully used 600 dpi).  Scans done at
> >> >> >> 72 dpi usually get very poor results from tesseract.
> >> >> >>
> >> >> >> Cheers,
> >> >> >> Rob Komar
> >> >> >
> >> >> > --
> >> >> > You received this message because you are subscribed to the Google
> >> >> > Groups "tesseract-ocr" group.
> >> >> > To post to this group, send email to
> [email protected]
> >> >> > To unsubscribe from this group, send email to
> >> >> > [email protected]
> >> >> > For more options, visit this group at
> >> >> > http://groups.google.com/group/tesseract-ocr?hl=en
> >> >> >
> >> >>
> >> >> --
> >> >> You received this message because you are subscribed to the Google
> >> >> Groups "tesseract-ocr" group.
> >> >> To post to this group, send email to [email protected]
> >> >> To unsubscribe from this group, send email to
> >> >> [email protected]
> >> >> For more options, visit this group at
> >> >> http://groups.google.com/group/tesseract-ocr?hl=en
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups "tesseract-ocr" group.
> >> > To post to this group, send email to [email protected]
> >> > To unsubscribe from this group, send email to
> >> > [email protected]
> >> > For more options, visit this group at
> >> > http://groups.google.com/group/tesseract-ocr?hl=en
> >> >
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "tesseract-ocr" group.
> >> To post to this group, send email to [email protected]
> >> To unsubscribe from this group, send email to
> >> [email protected]
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to