Post-processing is certainly not the same thing. If you restrict the tesseract engine itself to the ASCII charset, chances are that you're raising accurasy by forcing it to consider a more sensible alternative than the glyphs. Anyway I found the answer to this one. For anyone interested: http://code.google.com/p/tesseract-ocr/wiki/FAQ Search for the "only digits" section. Instead of the digits, you just define your allowed characters (a-z in my case).
On Wed, May 26, 2010 at 7:07 AM, Sriranga(77yrsold) <[email protected]> wrote: > Post-processing steps is a very excellent idea. > -srirnaga(77yrsold) > > On Wed, May 26, 2010 at 8:39 AM, nguyenq <[email protected]> wrote: >> >> You can perform some text manipulations in post-processing steps to >> strip out diacritical marks to leave only the base ASCII characters >> behind. >> >> On May 25, 3:34 pm, haratron <[email protected]> wrote: >> > http://www.linux.com/archive/feed/57222 >> > "Also, it can generate output only in the US-ASCII character set, so >> > glyphs with accent marks or other unsupported attributes will probably >> > be reproduced incorrectly." >> > >> > Which is the option to make it limit output to the ASCII charset only? >> > Some letters such as "a" are outputted as glyph symbols. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

