Re: More fuzzy issues - encouraging bad spelling?

2004-12-23 Thread markharw00d
>>That's quick. Do you have a time shrinking machine there? :) Actually, time's up. It'll be after Christmas before I spend any more time on this now but initial results looked promising so I'll make some code available, probably in the new year. I've got an update to the highlighter to release t

Re: More fuzzy issues - encouraging bad spelling?

2004-12-23 Thread Paul Elschot
Mark, On Thursday 23 December 2004 22:20, markharw00d wrote: > Thanks for the suggestions, Paul. > > I've just tried a scheme using the max docFreq of the expanded terms as > the docFreq shared by all expanded terms in their idf calculations > (giving a lower, shared, IDF) and I'm still removin

Re: More fuzzy issues - encouraging bad spelling?

2004-12-23 Thread markharw00d
Thanks for the suggestions, Paul. I've just tried a scheme using the max docFreq of the expanded terms as the docFreq shared by all expanded terms in their idf calculations (giving a lower, shared, IDF) and I'm still removing the coordination factor on the BooleanQuery that groups the term queri

Re: More fuzzy issues - encouraging bad spelling?

2004-12-23 Thread Paul Elschot
Mark, On Thursday 23 December 2004 14:25, mark harwood wrote: > Another thought on fuzzy scoring: > shouldn't all these queries which automatically expand > terms favour common words over rare ones? The default > scoring behaviour at the moment favours rare words. As > a user aren't I more likely

More fuzzy issues - encouraging bad spelling?

2004-12-23 Thread mark harwood
Another thought on fuzzy scoring: shouldn't all these queries which automatically expand terms favour common words over rare ones? The default scoring behaviour at the moment favours rare words. As a user aren't I more likely to be looking for the most common expansions? If I'm not sure how to sp