Dmitri, Thanks for the valuable suggestion With regards, -sriranga(78yrs) On Sat, Aug 20, 2011 at 5:49 PM, Dmitri Silaev <[email protected]>wrote:
> As a rule of the thumb, usually one can obtain good recognition > results for all standard regular fonts of 11-16pt size, be it a > screenshot or a 300 DPI scanned image. Should font size, resolution, > etc. differ significantly from these numbers, recognition quality > becomes a matter of experimentation. > > Warm regards, > Dmitri Silaev > www.CustomOCR.com > > > > > > On Sat, Aug 20, 2011 at 2:14 PM, Sriranga(78yrsold) > <[email protected]> wrote: > > Dmitri, > > really the issue is very complex/complicated to understand by layman > user. > > For training purpose in tesseract-ocr, , what is your expertise valuable > > guidance to be followed by users - who uses generally depends on scanner > > machine and "Print Screen"Key of the computer.. > > 1)for scanning the typed text - ,(a) font size in the text should be > > used.(b) resolution to be set in the scanner. > > 2)For Screenshot of the typed text file= with help of Irfanview, or > > imagemagic etc. resolution should be increased from 96 to 300 dpi > > for any image format like tif, png etc. > > With regards, > > -sriranga(78yrs) > > > > > > On Sat, Aug 20, 2011 at 1:33 PM, Dmitri Silaev <[email protected]> > > wrote: > >> > >> There are different cases of how pixel height of a font's character > >> should be calculated. If you're trying to recognize a screenshot, you > >> may deem one pt to be equal to one pixel when typing it in Windows > >> Paint. However this might not be true for more complex editors like > >> Photoshop. Also this depends on physical size of screen's pixel and > >> current video mode resolution. Another case is a scanned image, here > >> pixel height depends on scanning resolution. Still another case, where > >> imho trying to relate pixel height to font's point size absolutely > >> lacks sense (however it is possible via some multi-parameter > >> formulas), is a photographic or video frame image; here pixel height > >> varies depending on the camera position and even can vary within a > >> single line of text. > >> > >> All in all, Tesseract does not bother itself with DPIs, pt sizes, > >> etc.; only pixel size is important for recognition. You can use this > >> formula for scanned images to roughly determine font pixel height: > >> > >> pixels = DPI * pts / 72 > >> > >> where pixels - pixel height to be found, DPI - scanning resolution, > >> pts - size of font in typographic points > >> > >> However the most reliable is to scan a test page and manually count > >> pixels. > >> > >> For those willing to understand everything, here are the links: > >> http://en.wikipedia.org/wiki/Dots_per_inch > >> http://en.wikipedia.org/wiki/Point_%28typography%29 > >> http://en.wikipedia.org/wiki/X-height > >> > >> Warm regards, > >> Dmitri Silaev > >> www.CustomOCR.com > >> > >> > >> > >> > >> > >> On Sat, Aug 20, 2011 at 7:48 AM, Sriranga(78yrsold) > >> <[email protected]> wrote: > >> > Dmitri, > >> > Thanks for the valuable guidance. I seek some clarification as > follows= > >> > (1)"Tesseract, trained with ordinary fonts, proved good with fonts > >> > of12-64 > >> > pixel height" it would be nice, if indicated equivalent font size for > >> > pixel > >> > of 12-64? For 10 or 20 pt size of the regular(ordinary) font what is > the > >> > pixel height used in the Notepad? > >> > I am not programmer nor developer - as such I am seeking valuable > >> > guidance > >> > as user. > >> > BTW Is it to possible to count the pixel of any size say 20 pt of > >> > regular in > >> > the paint brush in which it has gird ( graph like). Just > >> > now I tested in paintbrush vide screenshot attached. alphabets was > typed > >> > using Arial- 20 and counted pixel -it has 20 pixels. > >> > > >> > Thus it is presumed that 12-64 pixel height is equivalent to 12-64 > point > >> > size of the ordinary font - kindly confirm. > >> > With warmest regards, > >> > -sriranga(78yrs) > >> > > >> > > >> > On Sat, Aug 20, 2011 at 1:00 AM, Dmitri Silaev <[email protected] > > > >> > wrote: > >> >> > >> >> The DPI measure is confusing for Tesseract's OCR, forget about it. > The > >> >> big thing is within-image font's x-height, measured in pixels. > >> >> Tesseract, trained with ordinary fonts, proved good with fonts of > >> >> 12-64 pixel height. If you have bigger characters, scale them down. > If > >> >> you have a font that's bold, use morphology and erode characters > after > >> >> binarization. Experiment. Removing "greyness" won't help as it's not > a > >> >> generic way of getting rid of uneven illumination; you need to use > >> >> more sophisticated algorithms. Just using Photoshop won't let you > >> >> achieve much. > >> >> > >> >> Warm regards, > >> >> Dmitri Silaev > >> >> www.CustomOCR.com > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Fri, Aug 19, 2011 at 8:18 PM, Andriy Malovanyy < > [email protected]> > >> >> wrote: > >> >> > To Zdenko: > >> >> > I think I have 3.0 version installed, so maybe I should reinstall > the > >> >> > new version and try it. Thanks for the description of psm. Did you > >> >> > try > >> >> > to recognize other unedited images which I attached to > >> >> > the first post?? > >> >> > > >> >> > To Rob: > >> >> > Initially I had 640x480 image with 72dpi with number occupying > almost > >> >> > all the image. What I did is just opened the image in Photoshop, > went > >> >> > to size of image menu, changed the resolution to 300 dpi (image > >> >> > increased in size) and set the image size back to 640x480. So, with > >> >> > that I got 640x480 image with 300dpi resolution. > >> >> > > >> >> > On 19 Aug, 17:56, Robert Komar <[email protected]> wrote: > >> >> >> On Fri, 19 Aug 2011, Andriy Malovanyy wrote: > >> >> >> > To sriranga: > >> >> >> > I tried changing dpi (check the previous post). It doesnt work. > >> >> >> > >> >> >> Did you rescale the image from 72 dpi to 300 dpi, or just change > >> >> >> the tag on the original image to say 300 dpi? The latter won't > >> >> >> work. > >> >> >> Tesseract seems to be tuned to work best for scans at 300 dpi > >> >> >> (although I've often successfully used 600 dpi). Scans done at > >> >> >> 72 dpi usually get very poor results from tesseract. > >> >> >> > >> >> >> Cheers, > >> >> >> Rob Komar > >> >> > > >> >> > -- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups "tesseract-ocr" group. > >> >> > To post to this group, send email to > [email protected] > >> >> > To unsubscribe from this group, send email to > >> >> > [email protected] > >> >> > For more options, visit this group at > >> >> > http://groups.google.com/group/tesseract-ocr?hl=en > >> >> > > >> >> > >> >> -- > >> >> You received this message because you are subscribed to the Google > >> >> Groups "tesseract-ocr" group. > >> >> To post to this group, send email to [email protected] > >> >> To unsubscribe from this group, send email to > >> >> [email protected] > >> >> For more options, visit this group at > >> >> http://groups.google.com/group/tesseract-ocr?hl=en > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> > http://groups.google.com/group/tesseract-ocr?hl=en > >> > > >> > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to [email protected] > >> To unsubscribe from this group, send email to > >> [email protected] > >> For more options, visit this group at > >> http://groups.google.com/group/tesseract-ocr?hl=en > > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

