On Wed, Oct 8, 2008 at 3:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> chane is in the dictionary.  For better or worse, Lucene skips words that
> are in the dictionary when OMP is false.


Ah, I see.  I think we'll use OMP=true, which seems like a reasonable
setting anyway.


> Makes sense to me.  I could see the Spellchecker being modified (in Lucene)
> to provide alternate scoring/sorting.  Right now, you can use other distance
> measures, as well, so you could codify your idea and try it out to see if it
> is better (and then donate it!)
> You might try the Jaro-Winkler measure, too, as it is a bit more
> sophisticated than Levenstein when it comes to scoring.
>

I just tried J-W and *yes* it seems to do a much better job!  I'd certainly
vote for that becoming the default :)

Thanks for all the help!  Much appreciated.

Jason

-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/

Reply via email to