On Wed, Oct 8, 2008 at 3:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> chane is in the dictionary. For better or worse, Lucene skips words that > are in the dictionary when OMP is false. Ah, I see. I think we'll use OMP=true, which seems like a reasonable setting anyway. > Makes sense to me. I could see the Spellchecker being modified (in Lucene) > to provide alternate scoring/sorting. Right now, you can use other distance > measures, as well, so you could codify your idea and try it out to see if it > is better (and then donate it!) > You might try the Jaro-Winkler measure, too, as it is a bit more > sophisticated than Levenstein when it comes to scoring. > I just tried J-W and *yes* it seems to do a much better job! I'd certainly vote for that becoming the default :) Thanks for all the help! Much appreciated. Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/