On 27 July 2010 21:04, patrickq <patrick.questemb...@gmail.com> wrote:
> Keep in mind that accuracy depends heavily on the right fonts being
> included in the training set. I have no reason to believe that the
> 2.04 and 3.0 training sets are identical - perhaps someone could
> enlighten us.

There is mention in one of the Tesseract papers that the training data
was extended on thousands of pages from Google Books, but whether or
not that's what's actually in the language packs... who knows?

> In any case, I routinely come accross certain pages
> where recognition is terrible and where there is no doubt that the
> cause is a missing font.



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to