Hi, i'm using latest version from repository ( v3?)

My ocr training language is catalan. I'm using spanish trainset from
download page in this google group to train all characters.

My word list is about 500.000 words (in fact there are 250.000
lowercase and uppercase versions of a word) and ocr works fast in
recnogtion (with 5 min. creating the dawg file) and with very good
precision (if the word is in the txt file tesseract will fix any
misspelling in image).

next step is avoiding ( | > I ) errors, I'm reading how to constrain
the character set to use in recognition. There is any file to do
that?.

I miss more v3 training information.

Ramon.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to