If you can not read this mail easily check this ticket:
https://issues.apache.org/jira/browse/SOLR-3106 This is a copy.
Regards!
Dalius Sidlauskas
On 08/02/12 15:44, Dalius Sidlauskas wrote:
Sorry for inaccurate title.
I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)
containing same value:
<title xmlns="http://www.tei-c.org/ns/1.0">cal.lígraf</title>
and these fields are configured accordingly:
<fieldType name="xml" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="xml_unicode" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="xml_unicode_full" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
And finally my search configuration:
<requestHandler name="dictionary" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">all</str>
<str name="defType">edismax</str>
<str name="mm">2<-25%</str>
<str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
<int name="rows">10</int>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
I am trying to match the field with various search phrases (that are
valid). There are results:
# search phrase match? Comment
1 cal.lígra? yes
2 cal.ligra? no Changed í to i
3 cal.ligraf yes
4 calligra? no
The problem is the #2 attempt to match a data. The #3 works replacing
? with f.
One more thing. If * is used insted of ? other data is matched as
cal.lígrafia but not cal.lígraf...
Also I have spotted some logic missmatch in debug parsedQuery field:
*
cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 |
dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
*cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 |
dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))
Should the second be "*calligra?*" insted?*
*Environment:
Tomcat 7.0.25 (request encoding UTF-8)
Solr 3.5.0
Java 7 Oracle
Ubuntu 11.10