On Mar 11, 2010, at 1:34 PM, Chris Hostetter wrote: > I wonder if one way to try and generalize > the idea of "unlikely" letter combinations into a math problem (instead of > grammer/spelling problem) would be to score all the hapax legomenon > words in your index
Hmm, how about a classifier? Common words are the "yes" training set, hapax legomenons are the "no" set, and n-grams are the features. But why isn't the OCR program already doing this? wunder