Hi Jimi,

It looks like the suggest mode defaults to only returning results when the query term is not in your index.

I think setting spellcheck.onlyMorePopular=true, or a value for spellcheck.alternativeTermCount should change the suggest mode to one that doesn't enforce that restriction. Apologies if you've already tried those, but it doesn't look like they're set in your query params below.

I know from recent experience that setting up spell checkers (and suggesters) can be an irritating process, and often a result of lots of trial and error testing!

All the best,

Matt


On 10/01/17 15:41, jimi.hulleg...@svensktnaringsliv.se wrote:
No one has any input on my post below about the spelling suggestions? I just find it a 
bit frustrating not being able to understand this feature better, and why it doesn't give 
the expected results. A built in "explain" feature really would have helped.

/Jimi

-----Original Message-----
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se]
Sent: Friday, December 16, 2016 9:58 PM
To: solr-user@lucene.apache.org
Subject: Can't get spelling suggestions to work properly

Hi,

I'm trying to add the spelling suggestion feature to our search, but I'm having 
problems getting suggestions on some misspellings.

For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000 
documents in our index.

A search for the incorrect spelling 'myket' (a missing 'c') gives several 
spelling suggestions, and the top one is 'mycket'. This is the wanted/expected 
behaivor.

But a search for the incorrect spelling 'mycet' (a missing 'k') gives no 
spelling suggestions.

The only difference between these two searches is that the one that results in 
spelling suggestions had zero results, while the other one had two (2) results. 
These two documents contain this incorrect spelling ('mycet'). Can this be the 
cause of no spelling suggestions? But I have set 'maxQueryFrequency' to 0.001, 
and with 40.000 documents in the index that should mean that the word can exist 
in up to 40 documents, and since 2 is less than 40 I argue that that this word 
would be considered a spelling misstake. But for some reason the solr 
spellchecker considers 'myket' as an incorrect spelling, while 'mycet' 
incorrectly is considered as a correct spelling.

Also, I tried with spellcheck.accuracy=0 just to rule out that I have a too 
high accuracy setting, but that didn't help.

Can someone see what I'm doing wrong, or give some tips on configuration 
changes and/or how I can troubleshoot this? For example, is there any way to 
debug the spellchecker function?


Here are the searches:

Search for 'myket':

http://localhost:8080/solr/s2/select/?q=myket&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true

Spellcheck output for 'myket':

<lst name="spellcheck">
                             <lst name="suggestions">
                                                          <lst name="myket">
                                                                                       <int 
name="numFound">16</int>
                                                                                       <int 
name="startOffset">0</int>
                                                                                       <int 
name="endOffset">5</int>
                                                                                       <int 
name="origFreq">0</int>
                                                                                       <arr 
name="suggestion">
                                                                                      
                              <lst>
                                                                                                      
                                           <str name="word">mycket</str>
                                                                                                      
                                           <int name="freq">14039</int>
                                                                                      
                              </lst>
                                                                                
                                    [...]
                                                                                      
 </arr>
                                                          </lst>
                                                          <bool 
name="correctlySpelled">false</bool>
                                                          <lst name="collation">
                                                                                       <str 
name="collationQuery">mycket</str>
                                                                                       <int 
name="hits">14005</int>
                                                                                       <lst 
name="misspellingsAndCorrections">
                                                                                                      
              <str name="myket">mycket</str>
                                                                                      
 </lst>
                                                          </lst>
                                                          [...]
                                                          </lst>
                             </lst>
</lst>


Spellcheck output for 'mycet':

http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true

Search for 'mycet':

http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true

Spellcheck output:

<lst name="spellcheck">
                             <lst name="suggestions">
                                                          <bool 
name="correctlySpelled">true</bool>
                             </lst>
</lst>


Below is the relevant configuration.


The field type used for the spellchecker:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
                             <analyzer>
                                                          <charFilter 
class="solr.HTMLStripCharFilterFactory" />
                                                          <charFilter 
class="solr.PatternReplaceCharFilterFactory" pattern="([.])" replacement=" " />
                                                          <tokenizer 
class="solr.StandardTokenizerFactory" />
                                                          <filter 
class="solr.LowerCaseFilterFactory" />
                                                          <filter 
class="solr.KeywordRepeatFilterFactory" />
                                                          <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                             </analyzer> </fieldType>


Parameters added to the standard request handler:

<str name="spellcheck.count">20</str>
<str name="spellcheck.dictionary">swedishSpelling</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollations">2</str>
<str name="spellcheck.maxCollationTries">10</str>

And the spellcheck component:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
                             <str name="queryAnalyzerFieldType">text</str>
                             <lst name="spellchecker">
                                                          <str 
name="name">swedishSpelling</str>
                                                          <str 
name="field">swedishSpelling</str>
                                                          <str 
name="classname">solr.DirectSolrSpellChecker</str>
                                                          <str 
name="distanceMeasure">internal</str>
                                                          <float 
name="accuracy">0.0</float>
                                                          <int 
name="maxEdits">2</int>
                                                          <int 
name="minPrefix">0</int>
                                                          <int 
name="maxInspections">5</int>
                                                          <int 
name="minQueryLength">4</int>
                                                          <float 
name="maxQueryFrequency">0.01</float>
                                                          <float 
name="thresholdTokenFrequency">0.001</float>
                             </lst>
                             <lst name="spellchecker">
                                                          <str 
name="name">englishSpelling</str>
                                                          <str 
name="field">englishSpelling</str>
                                                          <str 
name="classname">solr.DirectSolrSpellChecker</str>
                                                          <str 
name="distanceMeasure">internal</str>
                                                          <float 
name="accuracy">0.0</float>
                                                          <int 
name="maxEdits">2</int>
                                                          <int 
name="minPrefix">0</int>
                                                          <int 
name="maxInspections">5</int>
                                                          <int 
name="minQueryLength">4</int>
                                                          <float 
name="maxQueryFrequency">0.001</float>
                                                          <float 
name="thresholdTokenFrequency">0.0025</float>
                             </lst>
</searchComponent>


Regards
/Jimi


--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk

Reply via email to