Shalin Shekhar Mangar wrote:
On Sun, Feb 15, 2009 at 8:56 AM, Mark Miller <markrmil...@gmail.com> wrote:

I think thats the problem with it. People do think of it this way, and it
ends up being very confusing.

If you dont use onlyMorePopular, and you ask for suggestions for a word
that happens to be in the index, you get the word back.

So if I ask for corrections to Lucene, and its in the index, it suggests
Lucene. This is nice for multi term suggestions, because for "mrk lucene" it
might suggest "mark lucene".

Now say I want to toggle onlyMorePopular to add frequency into the mix - my
expectation is that, perhaps now I will get the suggestion "mork lucene" if
mork has a higher freq than mark.

But I will get maybe "mork luke" instead, because I am guaranteed not to
get Lucene as a suggestion if onlyMorePopular is on.


onlyMorePopular=true considers tokens of frequency greater than equal to
frequency of original token. So you may still get Lucene as a suggestion.

Is that the only difference? When I look at the code (I'm new to this area of the code, so I certainly could be wrong, wouldnt be the first time, or less than the 100,000th probably), I see:

// if the word exists in the real index and we don't care for word frequency, return the word itself
   if (!morePopular && freq > 0) {
     return new String[] { word };
   }

So if you have onlyMorePopular=false, Lucene will get Lucene if its in the index. But if we make it past that line (onlyMorePopular=true), later there is:

     // don't suggest a word for itself, that would be silly
     if (sugWord.string.equals(word)) {
       continue;
     }

So you end up only getting all of the suggestions *but* Lucene, right? You had to already know the word was misspelled, and now your asking for a better one. With the onlyMorePopular=false, you only get a correction if the word is misspelled.

It seems to me, if you are trying to use the suggested query thats built up, you change the behavior beyond just:

onlyMorePopular=true considers tokens of frequency greater than equal to
frequency of original token.

- Mark



Reply via email to