hello,

I'm using Solr 3.5 over Tomcat 6 and I've some problemes with unicode quey.

Here is my text field configuration
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ElisionFilterFactory" articles="elisions.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="French" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ElisionFilterFactory" articles="elisions.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="French" />
</analyzer>

When I performe this request : select/?q=hygiene sécurité&debugQuery=true
Here is debug infos :
<str name="rawquerystring">hygiene sécurité</str>
<str name="querystring">hygiene sécurité</str>
<str name="parsedquery">searchText:hygien (searchText:sa
searchText:curit)</str>
<str name="parsedquery_toString">searchText:hygien (searchText:sa
searchText:curit)</str>

Has you can see, unicode request failed : "searchText:sa searchText:curit"
instead of "searchText:securite"
I've tried with "ISOLatin1AccentFilterFactory", I've changed the order, but
no difference :(

Any ideas ?

Thanks

Frederic

Reply via email to