Post-processing is certainly not the same thing. If you restrict the
tesseract engine itself to the ASCII charset, chances are that you're
raising accurasy by forcing it to consider a more sensible alternative
than the glyphs.
Anyway I found the answer to this one.
For anyone interested:
http://code.google.com/p/tesseract-ocr/wiki/FAQ
Search for the "only digits" section. Instead of the digits, you just
define your allowed characters (a-z in my case).

On Wed, May 26, 2010 at 7:07 AM, Sriranga(77yrsold)
<[email protected]> wrote:
> Post-processing steps is a very excellent idea.
> -srirnaga(77yrsold)
>
> On Wed, May 26, 2010 at 8:39 AM, nguyenq <[email protected]> wrote:
>>
>> You can perform some text manipulations in post-processing steps to
>> strip out diacritical marks to leave only the base ASCII characters
>> behind.
>>
>> On May 25, 3:34 pm, haratron <[email protected]> wrote:
>> > http://www.linux.com/archive/feed/57222
>> > "Also, it can generate output only in the US-ASCII character set, so
>> > glyphs with accent marks or other unsupported attributes will probably
>> > be reproduced incorrectly."
>> >
>> > Which is the option to make it limit output to the ASCII charset only?
>> > Some letters such as "a" are outputted as glyph symbols.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to