On Mar 11, 2010, at 1:34 PM, Chris Hostetter wrote:

> I wonder if one way to try and generalize 
> the idea of "unlikely" letter combinations into a math problem (instead of 
> grammer/spelling problem) would be to score all the hapax legomenon 
> words in your index

Hmm, how about a classifier? Common words are the "yes" training set, hapax 
legomenons are the "no" set, and n-grams are the features.

But why isn't the OCR program already doing this?


Reply via email to