On 25 May 2010 21:34, haratron <[email protected]> wrote: > http://www.linux.com/archive/feed/57222 > "Also, it can generate output only in the US-ASCII character set, so > glyphs with accent marks or other unsupported attributes will probably > be reproduced incorrectly." > > Which is the option to make it limit output to the ASCII charset only? > Some letters such as "a" are outputted as glyph symbols. >
That refers to an ancient version of Tesseract; since then, Tesseract has added support for languages other than English, using Unicode by default. I don't think there's any option to output to ASCII. You might want to try something like unaccent (http://www.nongnu.org/unac/) -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

