OK, my first guess was wrong. Although Tess was not designed to recognize screen fonts, especially those composed of one pixel wide strokes, it sometimes can display satisfactory results. As for punctuation, Tess often considers it noise (because of 1-3 pixel size) and discard it completely. This is the reason in your case. Possible solutions are: - upscale your images by the factor of 2 or 3 - print on paper and scan like it is written in the Wiki - do your own recognition of punctuation; this is relatively easy in your case
Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, May 26, 2011 at 10:08 AM, joyse1 <[email protected]> wrote: > When I use Tesseract with ONE_WORD option - during box creation - tess > recognizes comma, but dot and ":" doesnt. Than Im inserting boxes for > those signs. And result is as You can see on attached pic ... > > On 26 Maj, 15:39, Joyse1 <[email protected]> wrote: >> png, box, and apply_boxes msges You will find in attachment >> >> thanks in advance! >> >> > I think I know, what could be the issue here. Refer to >> >http://code.google.com/p/tesseract-ocr/issues/detail?id=446&can=5. >> > Despite your using another layout mode, this issue can still hold >> > true. >> >> > In brief, for small images Tess confuses background and foreground >> > pixels. That's why it treats characters' inner holes as characters and >> > recognizes them as such. To avoid this you can try to add more >> > characters to the training image or make corrections to the Tesseract >> > code - I've indicated what should be done inside the issue. >> >> > However I might be wrong. To give more relevant advice I need to see >> > your images, cmd line etc. >> >> > Warm regards, >> > Dmitri Silaev >> >www.CustomOCR.com >> >> > On Thu, May 26, 2011 at 5:30 AM, Joyse1<[email protected]> wrote: >> >> Hi, >> >> I have small font ( Microsoft Sans serif , 8, string to learn: " 0 1 2 >> >> 3 4 >> >> 5 6 7 8 9 . , : " ). I cant train single pixels recognition ( ex.: ".", >> >> "," >> >> , ":" ). I have failures when generating tr files. >> >> I have two versions of tess: with layout analizator turned on, and >> >> one_word_only option turned on. Only difference between them is that with >> >> one word option ( PSM_ONE_WORD in tesseract ) - it generates box and >> >> recognizes a comma . So i have failures ( "no blobs ..." ) only for "." >> >> and >> >> ":" ( with layout analizator turned on i have failures for three of them : >> >> ". , :" ). I dont think that changing one_word option to single_char >> >> could >> >> help here. Please could somebody tell me what is a soution here ( without >> >> resizing training images ). >> >> >> Best >> >> Jakub >> >> >> -- >> >> You received this message because you are subscribed to the Google >> >> Groups "tesseract-ocr" group. >> >> To post to this group, send email to [email protected] >> >> To unsubscribe from this group, send email to >> >> [email protected] >> >> For more options, visit this group at >> >>http://groups.google.com/group/tesseract-ocr?hl=en >> >> >> >> apply_boxes_info.PNG >> 21KZobaczPobierz >> >> normal.box >> < 1KWyświetlPobierz >> >> normal.PNG >> 1KZobaczPobierz > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

