Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 <searchComponent name="spellcheck"
class="solr.SpellCheckComponent">
 <str
name="queryAnalyzerFieldType">spell</str>
 <!-- Multiple "Spell
Checkers" can be declared and used by this
 component
 -->
 <!-- a
spellchecker built from a field of the main index -->
 <lst
name="spellchecker">
 <str name="name">default</str>
 <str
name="field">spell</str>
 <str
name="classname">solr.DirectSolrSpellChecker</str>
 <!-- the spellcheck
distance measure used, the default is the internal levenshtein -->
 <str
name="distanceMeasure">internal</str>
 <!-- uncomment this to require
suggestions to occur in 1% of the documents
 <float
name="thresholdTokenFrequency">.01</float>
 -->
 </lst>
 <!-- a
spellchecker that can break or combine words. See "/spell" handler below
for usage -->
 <lst name="spellchecker">
 <str
name="name">wordbreak</str>
 <str
name="classname">solr.WordBreakSolrSpellChecker</str>
 <str
name="field">spell</str>
 <str name="combineWords">true</str>
 <str
name="breakWords">true</str>
 <int name="maxChanges">10</int>
 </lst>

</searchComponent>

And I've added the spellcheck component to my
/select request handler:

 <requestHandler name="/select"
class="solr.SearchHandler">
 ...
 <arr name="last-components">

<str>spellcheck</str>
 </arr>
 </requestHandler>

I have built up the
spellchecker source in the schema.xml from the name field:

 <field
name="spell" type="spell" indexed="true" stored="true" required="false"
multiValued="false"/>
 <copyField source="name" dest="spell"
maxChars="30000" />
 ...
 <fieldType name="spell" class="solr.TextField"
positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer
class="solr.StandardTokenizerFactory"/>
 </analyzer>
 <analyzer
type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>

</analyzer>
 </fieldType>

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates "Sichtscheibe" and "Sichtscheiben" into
"Sichtscheib", so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?

 

Reply via email to