Hi Jimi,
It looks like the suggest mode defaults to only returning results when
the query term is not in your index.
I think setting spellcheck.onlyMorePopular=true, or a value for
spellcheck.alternativeTermCount should change the suggest mode to one
that doesn't enforce that restriction. Apologies if you've already tried
those, but it doesn't look like they're set in your query params below.
I know from recent experience that setting up spell checkers (and
suggesters) can be an irritating process, and often a result of lots of
trial and error testing!
All the best,
Matt
On 10/01/17 15:41, jimi.hulleg...@svensktnaringsliv.se wrote:
No one has any input on my post below about the spelling suggestions? I just find it a
bit frustrating not being able to understand this feature better, and why it doesn't give
the expected results. A built in "explain" feature really would have helped.
/Jimi
-----Original Message-----
From: jimi.hulleg...@svensktnaringsliv.se
[mailto:jimi.hulleg...@svensktnaringsliv.se]
Sent: Friday, December 16, 2016 9:58 PM
To: solr-user@lucene.apache.org
Subject: Can't get spelling suggestions to work properly
Hi,
I'm trying to add the spelling suggestion feature to our search, but I'm having
problems getting suggestions on some misspellings.
For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000
documents in our index.
A search for the incorrect spelling 'myket' (a missing 'c') gives several
spelling suggestions, and the top one is 'mycket'. This is the wanted/expected
behaivor.
But a search for the incorrect spelling 'mycet' (a missing 'k') gives no
spelling suggestions.
The only difference between these two searches is that the one that results in
spelling suggestions had zero results, while the other one had two (2) results.
These two documents contain this incorrect spelling ('mycet'). Can this be the
cause of no spelling suggestions? But I have set 'maxQueryFrequency' to 0.001,
and with 40.000 documents in the index that should mean that the word can exist
in up to 40 documents, and since 2 is less than 40 I argue that that this word
would be considered a spelling misstake. But for some reason the solr
spellchecker considers 'myket' as an incorrect spelling, while 'mycet'
incorrectly is considered as a correct spelling.
Also, I tried with spellcheck.accuracy=0 just to rule out that I have a too
high accuracy setting, but that didn't help.
Can someone see what I'm doing wrong, or give some tips on configuration
changes and/or how I can troubleshoot this? For example, is there any way to
debug the spellchecker function?
Here are the searches:
Search for 'myket':
http://localhost:8080/solr/s2/select/?q=myket&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true
Spellcheck output for 'myket':
<lst name="spellcheck">
<lst name="suggestions">
<lst name="myket">
<int
name="numFound">16</int>
<int
name="startOffset">0</int>
<int
name="endOffset">5</int>
<int
name="origFreq">0</int>
<arr
name="suggestion">
<lst>
<str name="word">mycket</str>
<int name="freq">14039</int>
</lst>
[...]
</arr>
</lst>
<bool
name="correctlySpelled">false</bool>
<lst name="collation">
<str
name="collationQuery">mycket</str>
<int
name="hits">14005</int>
<lst
name="misspellingsAndCorrections">
<str name="myket">mycket</str>
</lst>
</lst>
[...]
</lst>
</lst>
</lst>
Spellcheck output for 'mycet':
http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true
Search for 'mycet':
http://localhost:8080/solr/s2/select/?q=mycet&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true
Spellcheck output:
<lst name="spellcheck">
<lst name="suggestions">
<bool
name="correctlySpelled">true</bool>
</lst>
</lst>
Below is the relevant configuration.
The field type used for the spellchecker:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter
class="solr.HTMLStripCharFilterFactory" />
<charFilter
class="solr.PatternReplaceCharFilterFactory" pattern="([.])" replacement=" " />
<tokenizer
class="solr.StandardTokenizerFactory" />
<filter
class="solr.LowerCaseFilterFactory" />
<filter
class="solr.KeywordRepeatFilterFactory" />
<filter
class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer> </fieldType>
Parameters added to the standard request handler:
<str name="spellcheck.count">20</str>
<str name="spellcheck.dictionary">swedishSpelling</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollations">2</str>
<str name="spellcheck.maxCollationTries">10</str>
And the spellcheck component:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text</str>
<lst name="spellchecker">
<str
name="name">swedishSpelling</str>
<str
name="field">swedishSpelling</str>
<str
name="classname">solr.DirectSolrSpellChecker</str>
<str
name="distanceMeasure">internal</str>
<float
name="accuracy">0.0</float>
<int
name="maxEdits">2</int>
<int
name="minPrefix">0</int>
<int
name="maxInspections">5</int>
<int
name="minQueryLength">4</int>
<float
name="maxQueryFrequency">0.01</float>
<float
name="thresholdTokenFrequency">0.001</float>
</lst>
<lst name="spellchecker">
<str
name="name">englishSpelling</str>
<str
name="field">englishSpelling</str>
<str
name="classname">solr.DirectSolrSpellChecker</str>
<str
name="distanceMeasure">internal</str>
<float
name="accuracy">0.0</float>
<int
name="maxEdits">2</int>
<int
name="minPrefix">0</int>
<int
name="maxInspections">5</int>
<int
name="minQueryLength">4</int>
<float
name="maxQueryFrequency">0.001</float>
<float
name="thresholdTokenFrequency">0.0025</float>
</lst>
</searchComponent>
Regards
/Jimi
--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk