Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, 
or other "heavy" analysis.  I would <copyField /> to a field that is tokenized 
on whitespace with little else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if 
the user misspells a word but the misspelling is actually some other word that 
is indexed, it will never suggest.  You can orverride this behavior by 
specifying  "spellcheck.alternativeTermCount" with a value >0.  This is how 
many suggestions it should give for words that indeed exist in the index.  This 
can be the same value as "spellcheck.count", but you may wish to set it to a 
lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to 
"spellcheck.alternativeTermCount", but in my opinion, the later gives a better 
experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, 
then the spellchecker will not suggest anything if more results are returned 
than the value you specify.  This is helpful in providing "did you mean"-style 
suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written 
query that returns results, then specify both "spellcheck.collate=true" and 
"spellcheck.maxCollationTries" to a value >0 (possibly 5-10).  This will cause 
it to internally check the re-written queries (aka. Collations) and report back 
on how many results you get for each.  If you are using "q.op=OR" or a low 
value for "mm", then you will likely want to override this with something like 
"spellcheck.collateParam.mm=0".  Otherwise every combination will get reported 
as returning results.

I hope this and other comments you've gotten helps demystify spellcheck 
configuration.  I do agree it is fairly complicated and frustrating to get it 
just right.

James Dyer
Ingram Content Group

-----Original Message-----
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good 
thing. Because now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. 
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being 
hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-----Original Message-----
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting 
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I 
also had to set "spellcheck.alternativeTermCount". With that done, I now get 
suggestions when searching for 'mycet' (a misspelling of the Swedish word 
'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this 
properly. Because with this change there now are other misspelled searches that 
now longer gives suggestions. The problem here is stemming, I suspect. Because 
the main search fields use stemming, so that in some cases one can get lots of 
results for spellings that doesn't exist in the index at all (or, at least not 
in the spelling-field). How can I configure this component so that those 
suggestions are still included? Do I need to set maxResultsForSuggest to a 
really high number? Like Integer.MAX_VALUE? I feel that such a setting would 
defeat the purpose of that parameter, in a way. But I'm not sure how else to 
solve this.

Also, there is one other things I wonder about the spelling suggestions, that 
you might have the answer to. Is there a way to make the logic case 
insensitive, but the presentation case sensitive? For example, a search for 
'georg washington' now would return 'george washington' as a suggestion, but ' 
Georg Washington' would be even better.

Regards
/Jimi


-----Original Message-----
From: alessandro.benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we 
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is 
greater than the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
      return new SuggestWord[0];
    }
/// If we are working in "Not in Index Mode" , with a document frequency >0 we 
get no misspelled corrections.
/
    
    int maxDoc = ir.maxDoc();
    
    if (maxQueryFrequency >= 1f && docfreq > maxQueryFrequency) {
      return new SuggestWord[0];
    } else if (docfreq > (int) Math.ceil(maxQueryFrequency * (float)maxDoc)) {
      return new SuggestWord[0];
    }
// then the MaxQueryFrequency as you correctly stated enters the game
    
...

Let's explore how you can end up in the first scenario :

if (maxResultsForSuggest == null || hits <= maxResultsForSuggest) {
          SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;
          if (onlyMorePopular) {
            suggestMode = SuggestMode.SUGGEST_MORE_POPULAR;
          } else if (alternativeTermCount > 0) {
            suggestMode = SuggestMode.SUGGEST_ALWAYS;
          }

You did not set maxResultsForSuggest ( and not onlyMorePopular or alternative 
term count) so you ended up in :
SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;

>From Solr javaDoc :

If left unspecified, the default behavior will prevail.  That is, 
"correctlySpelled" will be false and suggestions
   * will be returned only if one or more of the query terms are absent from 
the dictionary and/or index.  If set to zero,
   * the "correctlySpelled" flag will be false only if the response returns 
zero hits.  If set to a value greater than zero, 
   * suggestions will be returned even if hits are returned (up to the 
specified number).  This number also will serve as
   * the threshold in determining the value of "correctlySpelled". 
Specifying a value greater than zero is useful 
   * for creating "did-you-mean" suggestions for queries that return a low 
number of hits.
   * </p>
   */
  public static final String SPELLCHECK_MAX_RESULTS_FOR_SUGGEST = 
SPELLCHECK_PREFIX + "maxResultsForSuggest";

You probably want to bypass the other parameters and just set the proper 
maxResultsForSuggest param for your spellchecker Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-get-spelling-suggestions-to-work-properly-tp4310079p4313685.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to